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Thit  report  presents  a model  to  be  used  for  database  design.  Because  our  motivation  extends 
to  providing  guidance  for  the  structured  implementation  of  a database,  we  rail  our  model  the 
Structural  Moddi.  We  derive  the  design  using  criteria  of  correctness,  relevance,  and  performance 
from  semantic  and  operational  specifications  obtained  from  multiple  sources.  These  sources  typi- 
cally correspond  to  prospective  users  or  user  groups  of  the  database.  The  integration  of  such 
specifications  is  a central  issue  in  the  development  of  an  integrated  structural  database  model. 

The  structural  model  is  used  for  the  design  of  the  logical  structures  that  represent  a real- 
world  situation.  However,  it  is  not  meant  to  represent  all  possible  real-world  semantics,  but  a 
subset  of  the  semantics  which  are  important  in  database  modelling, 

The  model  uses  relations  as  building  blocks,  and  hence  can  be  chrisidered  as  an  extension 
of  Codd's  relational  model  [Codd70].  The  main  extensions  to  the  relational  model  are  the  ex- 
plicit representation  of  logical  connections  between  relations,  the  inclusion  of  insertion-deletion 
constraints  in  the  model  itself,  and  the  separation  of  relations  into  several  structural  types. 

Connections  between  relations  are  used  to  represent  existence  dependencies  of  tuples  in 
different  relations.  These  existence  dependencies  are  important  for  the  definition  of  semantics  of 
relationships  between  classes  of  real-world  entities.  The  connections  between  relations  are  used  to 
specify  these  existence  dependencies,  and  to  ensure  that  they  remain  valid  when  the  database  is 
updated.  Hence,  connections  implicitly  define  a basic,  limited  set  of  integrity  constraints  on  the 
database,  those  that  identify  and  maintain  existence  dependencies  among  tuples  from  different 
relations.  Consequently,  the  rules  for  the  maintenance  of  the  structural  integrity  of  the  model 
under  insertion  and  deletion  of  tuples  are  easy  to  specify. 

Structural  relation  types  are  used  to  specify  how  each  relation  may  be  connected  to  other 
relations  in  the  model.  Relations  are  classified  into  five  types:  primary  relations,  referenced  rela- 
tions, nest  relations,  association  relations,  and  lexicon  relations.  The  motivation  behind  the  choice 
of  these  relation  types  is  discussed,  as  is  their  use  in  data  model  design. 

A methodology  for  combining  multiple,  overlapping  data  models  — also  called  user  views 
in  the  literature  — is  associated  with  the  structural  model.  The  database  model,  or  conceptual 
schema,  which  represents  the  integrated  database,  may  thus  be  derived  from  the  individual  data 
models  of  the  users.  We  believe  that  the  structural  model  can  be  used  to  represent  the  data 
relationships  within  the  conceptual  schema  of  the  ANSI/SPARC  DBMS  model  since  it  can  support 
database  submodels,  also  called  external  schema,  and  maintain  the  integrity  of  the  submodels  with 
respect  to  the  integrity  constraints  expressable  in  the  structural  model. 

We  then  briefly  discuss  the  use  of  the  structural  model  in  database  design  and  implementation. 
The  structural  model  provides  a tool  to  deal  effectively  with  the  complexity  of  large,  real-world 
databases. 

We  begin  this  report  with  a very  short  review  of  existing  database  models.  In  Chapter  2,  we 
state  the  purpose  of  the  model,  and  in  Chapter  3 we  describe  the  structural  model,  first  informally 
and  then  using  a formal  framework  based  on  extensions  of  the  relational  model.  Chapter  4 defines 
the  representations  we  use,  and  Chapter  5 covers  the  integration  of  data  models  that  represent  the 
different  user  specifications  into  an  integrated  database  model.  Formal  descriptions  and  examples 
of  the  prevalent  cases  are  given. 

The  work  is  then  placed  into  context  first  relative  to  other  work  (Chapter  6)  and  then  briefly 
within  our  methodology  for  database  design  (Chapter  7). 
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ABSTRACT 


Thia  report  preaenta  a model  to  be  uaed  for  database  design.  Because  our  motivation  extends 
to  providing  guidance  for  the  structured  implementation  of  a database,  we  call  our  model  the 
Structural  Model.  We  derive  the  design  using  criteria  of  correctness,  relevance,  and  performance 
from  semantic  and  operational  specifications  obtained  from  multiple  sources.  These  sources  typi- 
cally correspond  to  prospective  users  or  user  groups  of  the  database.  The  integration  of  such 
specifications  is  a central  issue  in  the  development  of  an  integrated  structural  database  model. 

The  structural  model  is  used  for  the  design  of  the  logical  structures  that  represent  a real- 
world  situation.  However,  it  is  not  meant  to  represent  all  possible  real-world  semantics,  but  a 
subset  of  the  semantics  which  are  important  in  database  modelling. 

The  model  uses  relations  as  building  blocks,  and  hence  can  be  considered  as  an  extension 
of  Codd’s  relational  model  [Codd70],  The  main  extensions  to  the  relational  model  are  the  ex- 
plicit representation  of  logical  connections  between  relations,  the  inclusion  of  insertion-deletion 
constraints  in  the  model  itself,  and  the  separation  of  relations  into  several  structural  types. 

Connections  between  relations  are  used  to  represent  existence  dependencies  of  tuples  in 
different  relations.  These  existence  dependencies  are  important  for  the  definition  of  semantics  of 
relationships  between  classes  of  real-world  entities.  The  connections  between  relations  are  used  to 
specify  these  existence  dependencies,  and  to  ensure  that  they  remain  valid  when  the  database  is 
updated.  Hence,  connections  implicitly  define  a basic,  limited  set  of  integrity  constraints  on  the 
database,  those  that  identify  and  maintain  existence  dependencies  among  tuples  from  different 
relations.  Consequently,  the  rules  for  the  maintenance  of  the  structural  integrity  of  the  model 
under  insertion  and  deletion  of  tuples  are  easy  to  specify. 

Structural  relation  types  are  used  to  specify  how  each  relation  may  be  connected  to  other 
relations  in  the  model.  Relations  are  classified  into  five  types:  primary  relations,  referenced  rela- 
tions, nest  relations,  association  relations,  and  lexicon  relations.  The  motivation  behind  the  choice 
of  these  relation  types  is  discussed,  as  is  their  use  in  data  model  design. 

A methodology  for  combining  multiple,  overlapping  data  models  — also  called  user  views 
in  the  literature  — is  associated  with  the  structural  model.  The  database  model,  or  conceptual 
schema,  which  represents  the  integrated  database,  may  thus  be  derived  from  the  individual  data 
models  of  the  users.  We  believe  that  the  structural  model  can  be  used  to  represent  the  data 
relationships  within  the  conceptual  schema  of  the  ANSI/SPARC  DBMS  model  since  it  can  support 
database  submodels,  also  called  external  schema,  and  maintain  the  integrity  of  the  submodels  with 
respect  to  the  integrity  constraints  expressable  in  the  structural  model. 

We  then  briefly  discuss  the  use  of  the  structural  model  in  database  design  and  implementation. 
The  structural  model  provides  a tool  to  deal  effectively  with  the  complexity  of  large,  real-world 
databases. 

We  begin  this  report  with  a very  short  review  of  existing  database  models.  In  Chapter  2,  we 
state  the  purpose  of  the  model,  and  in  Chapter  3 we  describe  the  structural  model,  first  informally 
and  then  using  a formal  framework  based  on  extensions  of  the  relational  model.  Chapter  4 defines 
the  representations  we  use,  and  Chapter  5 covers  the  integration  of  data  models  that  represent  the 
different  user  specifications  into  an  integrated  database  model.  Formal  descriptions  ar.d  examples 
of  the  prevalent  cases  are  given. 

The  work  is  then  placed  into  context  first  relative  to  other  work  (Chapter  6)  and  then  briefly 
within  our  methodology  for  database  design  (Chapter  7). 
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1.  CURRENT  STATE  OF  DATA  MODELS 


Database  systems  have  become  a major  topic  of  interest  because  of  their  widespread  use  in 
industry,  commerce,  government,  and  educational  institutions  [Steel74,  Sibley76,  Fry78].  Several 
data  models  have  been  proposed  to  represent  the  structure  of  databases.  The  most  widely  discussed 
models  are  the  relational  model  [Codd70],  the  hierarchical  model  [Tsichritzis76],  and  the  network 
model  (derived  from  the  CODASYL  database  system  specification  [CODASYL74]).  The  majority 
of  implemented  database  systems  use  one  of  the  above  models.  For  an  excellent  introduction  to 
these  three  database  models,  see  [CompSurv76]. 


1.1.  The  relational  model: 

The  relational  model  is  formed  from  relations.  Each  relation  is  composed  of  a set  of  struc- 
turally identical  tuples.  Tuples  are  composed  of  related  data  elements.  For  each  relation,  a relation 
description,  or  schema,  defines  the  attributes  and  the  possible  values  for  the  data  elements  that  each 
tuple  in  the  relation  may  take.  The  sets  of  tuples  in  a relation  is  described  using  the  mathematical 
theory  of  relations,  augmented  with  the  concept  of  functional  dependency  among  attributes.  The 
mathematical  basis  of  the  relational  model,  the  uniform  representation  of  all  structures  as  relations, 
and  the  syntactic  clarity  of  the  data  model  schema  provide  important  advantages  for  model  and 
query  analysis. 

The  relational  model  has  been  subjected  to  intensive  theoretical  scrutiny.  Third  normal 
form  [Codd72],  and  Boyce-Codd  normal  form  [Codd74]  have  been  defined  to  design  relations  with 
favorable  update  properties.  Bernstein  [Bernstein75]  describes  an  algorithm  for  synthesis  of  third 
normal  form  relations  from  functional  dependencies.  Fagin  [Fagin77]  introduced  multivalued  de- 
pendencies and  a fourth  normal  form  for  relations  to  extend  the  understanding  of  the  logical  design 
of  relational  databases. 

When  relations  are  built  solely  from  the  functional  or  multivalued  dependencies  among  all 
attributes  in  the  data  model,  several  possible  logical  data  models  can  be  derived  [Bernstein75, 
Fagin77,  Chang78,  Delobel78].  Further,  some  of  the  data  models  will  not  have  a direct  correspon- 
dence with  the  actual  real-world  situation  being  modelled  [Schmid75].  Then  the  database  designer, 
or  some  automatic  procedure,  has  to  choose  the  most  suitable  model. 

A drawback  of  the  basic  relational  model  is  that  known  relationships  among  entities  of  the 
situation  being  model  are  not  explicitly  represented  but  have  to  be  recognized  at  query  processing 
time  by  matching  attributes  that  have  the  same  domain.  This  requires  recognition  of  similar 
domains,  using  the  schema,  as  well  as  some  computation  within  the  database  to  match  data  ele- 
ments. Also,  logical  integrity  constraints  are  not  defined  within  the  model,  but  are  left  to  be  defined 
by  the  database  implementors.  In  one  approach,  integrity  constraints  are  described  by  assertions 
(Stonebraker74,  Eswaran75]. 


1.2.  The  hierarchical  model: 

The  hierarchical  model  represents  classes  of  entities  and  hierarchical  relationships  among 
different  entity  classes.  A class  of  entities  is  represented  as  a record  type,  and  the  hierarchical 
relationships  are  represented  by  a tree  structure,  with  record  types  as  nodes  in  the  tree.  The  record 
type  represents  the  attributes  of  a class  of  entities,  while  each  record  represents  a particular  entity 
of  the  class,  and  is  composed  of  data  items  that  describe  the  entity. 

Each  record  is  owned  by  only  one  record  of  the  record  type  at  the  level  above  it  in  the  tree, 
and  can  own  in  turn  any  number  of  records  of  the  record  types  below  it,  if  any.  Many  real  world 
situations  are  naturally  hierarchical,  and  are  thus  well  represented  by  a hierarchical  model.  In 
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particular,  individual  user  view*,  or  data  model*,  are  often  hierarchical.  Data  bates  used  by  multiple 
users  often  need  a more  complex  model.  In  the  hierarchical  model,  non-hierarchical  relationships 
are  represented  in  an  awkward  and  non-sym metric  fashion  by  defining  duplicate  record  types  and 
using  pointers. 


1.3.  The  network  model: 

The  network  model  allows  representation  of  non-hierarchical  relationships  among  entity 
classes.  A record  type  may  be  owned  by  more  than  one  record  type,  leading  to  a network  rep- 
resentation of  relationships  among  entity  classes.  This  permits  a direct  representation  of  m:n 
relationships  among  entity  classes.  The  concept  of  a link-set  between  two  record  types  is  introduced. 
A link-set  groups  together  records  of  one  record  type,  the  member  record  type,  that  are  owned  by 
a particular  record  of  a different  record  type,  the  owner  record  type.  Existence  dependencies  to 
govern  occurrences  of  owner  and  member  records  of  a link-set  are  specified  by  different  types  of 
link-sets,  such  a*  manual  and  automatic. 

The  database  administrator  may  specify  the  access  structure  used  for  implementing  a link- 
set  as  a chain  of  pointers,  a pointer  array,  or  he  may  specify  that  the  records  be  stored  physically 
adjacent.  Thus  access  to  the  records  in  a particular  link-set  via  the  owner  record  can  be  very 
efficient.  However,  the  database  designer  has  to  recognize  and  define  the  link-set  and  its  access 
structure  a priori,  and  queries  based  on  structures  not  directly  implemented  may  be  quite  costly 
to  process. 

A drawback  of  the  network  model  is  that  only  implemented  relationships  can  be  exploited, 
and  that,  due  to  implementation  constraints,  certain  relationships  are  difficult  to  express  (such  as 
recursive  sets  [Taylor76],  which  are  relationships  between  records  of  the  same  record  type).  Another 
criticism  is  that  it  is  too  implementation  oriented,  and  thus  provides  limited  data  independence 
[Engles89]. 


1.4.  Some  other  data  models: 

The  problems  with  the  relational,  hierarchical  and  network  models  have  led  to  active  research 
in  data  models.  Chang  [Chang78]  has  developed  an  approach  with  a ‘database  skeleton”  which 
includes  semantic  information  about  the  relationships  between  database  relations,  and  defines  the 
relationships  over  a time  frame  using  the  concept  of  the  ‘state”  of  the  database.  The  semantic 
information  is  used  by  the  system  in  query  translation,  and  incomplete  or  ‘fuzzy”  queries  may 
be  processed.  Manacher  [Manacher75]  differentiates  relationships  into  several  semantic  categories. 
Abrial  [Abrial74]  goes  further  by  distinguishing  every  relationship  according  to  its  particular 
semantic  notion,  but  states  that  his  model  would  be  too  complicated  for  database  construction. 

Chen  [Chen76]  has  proposed  a model  based  on  the  relational  model  which  clearly  distinguishes 
relations  into  two  types:  entities  and  relationships  among  the  entities.  Integrity  rules  for  logical 
consistency  are  considered  for  the  relation  types,  but  are  not  part  of  the  model.  Schmid  and 
Swenson  [Schmid75]  develop  the  semantics  of  the  relational  model,  and  show  that,  in  the  context 
of  their  model,  relations  in  third  normal  form  can  be  differentiated  into  five  semantic  types.  Rules 
for  insertion  and  deletion  of  tuples  are  given. 

More  recently,  model*  have  been  introduced  that  provide  a more  detailed  semantic  description 
of  the  situation  being  modelled  [Smith77,  Hammer76,  Navathe78).  In  these  papers,  constructs  are 
introduced  to  represent  subsets  of  entity  classes  in  the  data  model.  These  subsets  have  a semantic 
significance  in  the  data  model,  such  as  certain  identifying  properties  that  make  them  different 
from  other  entities  in  the  class. 
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The  requirement  to  have  a model  which  describee  the  data  relationships  independently  of 
implementation  concerns  was  recognised  when  standardisation  of  the  CODASYL  model  was  sug- 
gested. The  ANSI/X3/SPARC  committee  [Steel75]  has  described  a DBMS  architecture  in  response 
to  the  perceived  long  range  needs.  A principal  component  of  the  architecture  is  the  concept  - nl 
schema,  which  is  to  contain  essential  information  about  the  database  itself.  The  conceptual  schema 
would  be  augmented  by  an  internal  schema  to  define  the  implementation,  and  by  possibly  several 
external  schemas  to  represent  the  transformations  of  the  database  to  the  views  desired  by  the 
users. 
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2.  PURPOSE  OF  THE  STRUCTURAL  MODEL 


The  numerous  data  models  presented  in  the  literature  have  given  insight  into  the  process  of 
logical  data  model  design,  and  the  implemented  relational,  hierarchical  and  network  database  sys - 
terns  have  provided  experience  on  both  logical  and  physical  database  design  and  implementation. 
The  model  presented  here  is  intended  to  assist  in  the  development  of  a conceptual  data  model  inde- 
pendent of  any  implementation,  but  also  to  provide  a framework  for  database  implementation.  We 
propose  that  the  model  satisfies  the  criteria  [Kent77]  for  representing  the  relationships  within  the 
conceptual  schema  of  a database  system  that  has  an  architecture  similar  to  the  ANSI/X3/SPARC 
DBMS  architecture. 

The  structural  model  which  we  present  here: 

(1)  avoids  the  storage  structure  dependency  and  the  limitations  of  the  hierarchical  and  network 
models, 

(2)  introduces  semantic  information  to  the  relational  model  by  the  representation  of  logical 
connections  between  relations  which  also  define  structural  integrity  constraints  in  the  model 
itself, 

(3)  allows  a precise  representation  of  the  semantics  of  relationships  between  entity  classes,  and 

(4)  provides  a framework  for  the  design  of  a database  system  starting  with  the  design  of  in- 
dividual users  data  models,  to  the  integration  of  the  data  models  to  form  a global  database 
model,  and  finally  the  guidance  of  the  choice  of  database  implementation  structures. 

Associated  with  this  structural  model  is  a methodology  to  combine  multiple,  related  data 
models  to  form  an  integrated  database  model,  and  to  design  the  data  models  to  match  closely  the 
real-world  situation  being  represented.  The  individual  data  models  also  allow  the  user  to  specify 
some  of  his  requirements  of  the  database  system. 

The  model  we  present  is  built  from  relations,  augmented  with  two  additional  basic  concepts. 
First  we  associate  a relation  type  with  each  relation.  Second  we  associate  connection  types  with  the 
relation  types  which  define  the  structural  integrity  of  this  relation  with  respect  to  other  relations 
that  are  logically  related  to  it  in  the  model.  We  define  structural  integrity  to  be  the  maintenance 
of  a consistent  relationship  among  tuples  in  different  relations  of  the  data  model  as  defined  by  the 
connections  among  relations. 

During  the  design  and  integration  process,  the  relations  will  be  manipulated.  To  assure 
manipulatability,  we  require  all  relations  to  be  in  Boyce-Codd  normal  form.  However,  it  is  not 
necessary  to  build  the  relations  from  the  functional  dependencies  between  attributes.  Rather,  as 
also  argued  by  Chen  [Chen76],  if  we  first  define  the  logical  entities  and  relationships  from  tne  real- 
world,  then  simple  transformations  will  create  a model  where  all  relations  are  in  third  normal  form. 
Once  a relation  is  defined  with  all  its  attributes,  one  can  check  the  functional  dependencies  between 
the  attributes  of  the  relation.  If  a relation  is  not  in  third  normal  form,  it  may  be  transformed  into 
two  or  more  relations  in  third  normal  form  [Wiederhold77,  sec.7.2].  The  structural  model  prescribes 
how  the  data  model  relation  and  connection  types  will  represent  the  entities  and  relationships  of 
a particular  real-world  situation,  and  hence  limits  the  number  of  possible  data  models  that  may 
represent  a real-world  situation. 

We  note  here  that  the  structural  model  is  completely  independent  of  implementation  con- 
siderations. While  the  structural  model  does  represent  connections  between  relations,  it  does  not 
mandate  implementation  of  these  connections.  Rather,  the  connections  are  used  for  definition 
of  some  logical  integrity  constraints.  An  implementation  can  be  chosen  based  upon  an  existing 
relational,  hierarchical  or  network  database  management  system,  or  possibly  by  using  some  other 
approach. 
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3.  THE  STRUCTURAL  MODEL 


3.1.  Real-World  Structures: 

A database  system  is  used  to  model  some  aspect  of  the  real  world.  People  approach  real- world 
data  in  several  phases.  First,  they  observe  the  situation  and  collect  existing  data  that  describe 
the  situation.  Then,  from  their  observations,  they  classify  the  data  into  abstractions.  Next,  they 
assess  the  value  of  their  abstractions  in  terms  of  how  much  it  helps  them  manage  the  world  with  a 
minimum  of  exceptions.  Finally,  if  they  have  to  implement  a system,  they  describe  the  real-world 
situation  by  a data  model.  Such  a model  may  be  stored  on  some  physical  medium  (computer  or 
paper  files),  and  used  as  a guide  for  data  processing.  We  hence  introduce  a model  which  can  be 
used  to  represent  the  majority  of  real-world  situations  rather  than  a model  which  may  be  used  to 
represent  all  possible  real-world  semantics. 

The  main  building  blocks  of  the  data  model  are  dosses  of  entitle*,  such  as  PEOPLE,  CARS, 
HOUSES,. . . etc.  An  entity  class  is  described  by  the  primitive  components  that  are  used  to  describe 
each  of  its  members,  the  properties.  For  example,  the  entity  class  CARS  can  have  the  properties 
LICENSE-NUMBER,  COLOR,  MODEL,  YEAR.  The  properties  that  identify  a specific  entity 
within  the  entity  class,  in  this  case  the  single  property  LICENSE-NUMBER,  are  called  the  ruling 
properties.  The  properties  that  describe  characteristics  of  an  entity,  in  this  case  COLOR,  MODEL, 
and  YEAR,  are  called  the  dependent  properties. 

Associated  with  each  property  is  a domain,  the  set  of  values  the  it  can  take  in  any  of  the 
entities  that  have  this  property.  Some  properties  may  be  repeating.  For  example,  consider  the  class 
of  entities  EMPLOYEES.  One  of  the  properties  we  may  represent  is  the  SALARY-HJSTORY  of 
an  employee.  Each  employee  will  have  several  entries  of  the  salary  history,  one  for  each  salary  he 
had  during  his  previous  employment  period.  The  number  of  entries  is  variable  from  one  employee 
to  the  next.  The  SALARY-HISTORY  is  also  an  example  of  a compound  property,  one  which  is 
formed  of  several,  more  basic,  other  properties.  In  this  case,  SALARY-HISTORY  is  formed  from 
two  more  basic  attributes,  YEAR  and  SALARY-VALUE.  However,  such  compound  properties  can 
always  be  decomposed  into  several  of  the  basic  properties. 

We  also  have  to  model  the  relationships  that  exist  between  entity  classes.  A relationship  is  a 
mapping  among  classes.  Thus,  a relationship  defines  a rule  associating  an  entity  of  one  class  with 
entities  of  other  (not  necessarily  different)  classes.  Most  relationships  we  encounter  are  between  two 
entity  classes.  An  example  of  such  a relationship  is  CAR:OWNER  between  the  entity  classes  CARS 
and  PEOPLE.  Such  relationships  may  be  1:1  (for  example  COUNTRY PRESIDENT),  1:N  (for  ex- 
ample MANAGER:EMPLOYEE),  or  M:N  (for  example  STUDENT:  CLASS).  Other  relationships 
may  be  among  more  than  two  classes.  For  example,  the  relationship  SUPPLIER:PART:PROJECT 
it  among  three  entity  classes  SUPPLIERS,  PARTS,  and  PROJECTS. 

A relationship  between  two  entity  classes  has  two  important  characteristics:  the  cardinality 
and  the  dependency  . The  cardinality  of  a relationship  places  constraints  on  the  number  of  entities 
of  one  class  that  can  be  related  to  a single  entity  of  the  other  class.  The  dependency  characteristic 
of  a relationship  places  constraints  on  whether  an  entity  of  one  da6s  can  exist  that  is  not  related 
to  any  entities  of  the  other  class.  We  will  discuss  these  characteristics  more  fully  in  section  4.1. 

Finally,  some  classes  of  entities  may  be  sub-classes  of  other  entity  classes.  For  example,  the 
entity  class  EMPLOYEES  is  a sub-class  of  the  entity  class  PEOPLE. 

The  data  model  should  reflect  the  real-world  structure  as  closely  as  possible.  This  makes  it 
easier  for  the  users  to  understand  the  model,  and  allows  useful  semantic  information  from  the  real 
world  to  be  included  in  the  data  model. 
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In  the  str-irtural  model,  relation*  are  used  to  represent  entity  classes,  and  some  types  of 
relationships  betwt'n  entity  classes.  Other  relationships  between  entity  classes  are  represented  by 
connections  between  relations.  Relations  will  be  categorized  into  several  types,  according  to  the 
structure  they  represent  in  a data  model.  Connections  between  relations  will  also  be  classified  into 
types,  and  possible  connections  between  relation  types  are  a part  of  the  model. 

Simple  properties  are  represented  by  attributes  of  relations.  We  will  always  decompose 
compound  properties  into  the  simple  properties  from  which  they  are  formed. 


3.2.  Relations  and  Connections: 

Relational  concepts  are  well  known,  but  for  conciseness  we  now  define  relations  and  relation 
schemas  as  we  use  them  in  the  structural  model.  Then  we  formally  define  the  concept  of  connections 
between  relations. 

In  order  to  define  a relation,  we  first  define  attributes,  tuples  of  attributes,  and  relation 
schemas.  Relation  schemas  specify  the  attributes  of  a relation.  Attributes  define  the  domains  from 
which  data  elements  that  form  the  tuples  of  the  relation  can  take  values. 

We  will  use  B,  C,  D,  to  denote  single  attributes;  X,  Y,  Z,  to  denote  sets  of  attributes;  b,  c, 
d,  to  denote  values  of  single  attributes;  and,  x,  y,  t,  to  denote  tuples  of  sets  of  attributes.  For 
simplicity,  we  assume  that  all  sets  of  attributes  are  ordered. 

3.2.1.  Relations: 

Definition  1:  An  attribute  B is  a name  associated  with  a set  of  values,  DOM(B).  Hence,  a 
value  b of  attribute  B is  an  element  of  DOM(B). 

For  an  (ordered)  set  of  attributes  Y *=*  (Bj,  ...,Bm),  we  will  write  DOM(V)  to  denote 
DOM{Bi)  X ...  X DOM(Bm),  where  X is  the  cross  product  operation.  Hence,  DOM(Y)  is  the  set 
{<bi, . . .,  bm)  | bi  € DOM(Bi)  for  i - 1 m}. 

Definition  2:  A tuple  y of  a set  of  attributes  Y = (Bj, . . .,  Bm),  is  an  element  of  DOM(Y). 

Definition  3:  A relation  tchema ,R«,  of  order  m,  m > 0,  is  a set  of  attributes  Y “■=  (Bi, . . .,  Bm). 

The  relation,  R,  is  an  instance  (or  current  value)  of  the  relation  schema  R«,  and  is  a subset 
of  DOM(Y). 

Each  attribute  in  the  set  Y is  required  to  have  a unique  name. 

The  set  Y is  partitioned  into  two  subsets,  K and  G.  The  ruling  part,  K,  of  relation 
schema  Rs  is  a set  of  attributes  K = (Bi, . . .,  B*),  k <,m,  such  that  every  tuple  y in  R 
has  a unique  value  for  the  tuple  corresponding  to  the  attribute  set  K.  For  simplicity,  we 
assume  the  set  K is  the  first  k attributes  of  Y.  The  dependent  part,  G,  of  relation  schema 
R+  (=  Y)  is  the  set  of  attributes  G = Y — K,  where  — is  the  set  difference  operator. 

All  relations  are  in  Boyce-Codd  normal  form.  (For  definitions  of  functional  depend- 
ency and  Boyce-Codd  normal  form,  see  section  6.1.) 

We  will  write  R[Y)  or  R[Bi,...,Bm]  to  denote  that  relation  R is  defined  by  the  relation 
schema  Y •“  (Bi, . . .,  Bm). 

Also,  K(Y)  will  denote  the  ruling  part  of  relation  schema  Y,  and  G(V)  will  denote  the  de- 
pendent part.  Similarly,  for  a tuple  y in  relation  R,  defined  by  the  relation  schema  Y,  k(y)  will 
denote  the  tuple  of  values  that  correspond  to  the  attributes  K(V)  in  y,  and  g(y)  will  denote  the 
tuple  of  values  that  correspond  to  G(  Y)  in  y. 
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A relation  ft[Y]  may  have  teveral  attribute  subset*  Z which  satisfy  the  uniqueness  requirement 
for  ruling  part.  In  the  structural  model,  the  ruling  part  of  a relation  schema  is  defined  according 
to  the  type  of  the  relation  (see  sec.  3.4). 

3.2.2.  Connections: 

We  now  define  the  concept  of  a connection  between  two  relations,  then  define  the  types  of 
connections  that  are  used  in  the  structural  model.  A connection  is  defined  between  two  relation 
schemas.  An  instance  of  the  connection  exists  between  two  tuples,  one  from  each  relation. 

Definition  4:  A connection  between  relation  schemas  Xi  and  X2  is  established  by  two  sets  of 
connecting  attributes  Yi  and  Yj  such  that: 

a.  Vi  Q Xi. 

b.  V2  Q X2. 

c.  DOM(Yi)  = DOM(y2). 

We  then  say  that  Xi  is  connected  to  X2  through  (yj,  y2). 

Two  tuples,  one  from  each  relation,  are  connected  when  the  values  for  the  connecting 
attributes  are  the  same  in  both  tuples. 

The  definition  of  connection  is  symmetric  with  respect  to  Xi  and  X2,  and  thus  it  is  an 
unordered  pair. 

Connections  may  be  more  complex.  For  example,  if  we  desire  a connection  between  two 
sets  of  attributes  with  dissimilar,  but  related,  domains,  condition  (c)  above  may  by  changed  to 
DOM(  Yi)  «=  /(DOM(  y2)).  The  function  / will  relate  values  of  data  elements  from  the  two  domains. 
The  equality  condition  in  (c)  above  is  the  simplest  case. 

The  structural  model  uses  three  basic  types  of  connections,  which  we  now  define.  Associated 
with  each  of  the  connection  types  are  a set  of  integrity  constraints  that  define  the  existence  de- 
pendency of  tuples  in  the  two  connected  relations.  These  constraints  define  the  conditions  for 
the  maintanence  of  the  structural  integrity  of  the  model.  We  will  define  structural  integrity,  and 
discuss  these  constraints  in  section  3.5. 

Definition  5:  A reference  connection  from  relation  schema  Xi  to  relation  schema  X2  through 
(*i,  Va)  >•  a connection  between  Xi  and  X2  through  (y^,  y2)  such  that: 

a.  y2  — K(X2). 

b.  Yi  £ K(Xi),  or  yj  Q G(Xi),  but  yj  may  not  contain  attributes  from  both  K(Xj) 
and  G(Xj). 

Definition  5a:  A reference  is  an  identity  reference  if  Yi  ■■  K (Xi). 

Definition  5 b:  A reference  is  a direct  reference  if  it  is  not  an  identity  reference. 

Reference  and  direct  reference  are  not  defined  symmetrically  with  respect  to  X}  and  X2,  and 
thus  are  ordereded  pairs  (Xi,  X2)  when  the  reference  is  from  Xi  to  X2.  The  identity  reference  is 
defined  symmetrically,  but  we  still  consider  it  to  be  ordered.  This  is  because  identity  references  are 
used  to  represent  a subrelation  of  a relation,  defined  in  section  3.4.2,  and  we  consider  the  reference 
to  be  directed  from  the  subrelation  to  the  relation. 

Definition  6:  An  ovmerthip  connection  from  relation  schema  Xj  to  relation  schema  X2  through 
(yi,y2)  is  a connection  between  Xj  and  X2  through  (y^Vj)  »uch  that: 
a.  Yi  - K[Xi). 
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(a)  Direct  reference  (Xi,  X3)  from  the  ruling  part  of  Xx 


Xi 


(b)  Direct  reference  (Xi,  Xj)  from  the  dependent  part  of  Xi 


X3 


Xi 


X2 


(c)  Identity  reference  (Xj,  Xj)  (d)  Ownerihip  connection  (Xi,Xj) 

Figure  1.  Types  of  connections 


b.  Vj  C K(X3). 

The  ownership  connection  is  also  non-symmetric  with  respect  to  Xx  and  X3,  and  is  an  ordered 
pair  (Xx,  X3)  when  the  ownership  connection  is  from  Xx  to  X3. 

The  connections  defined  above  may  be  represented  graphically  as  in  figure  1.  They  are  rep- 
resented by  directed  arcs,  with  the  § representing  the  to  end  of  the  connection.  The  ruling  part 
attributes  in  each  relation  are  marked  K,  and  separated  from  the  dependent  part  attributes  by 
double  lines  ( II ). 
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3.3.  Types  of  relations: 

Relatione  in  the  etructural  model  are  classified  into  structural  types,  which  define  their  in- 
teraction with  other  relations  in  the  data  model.  Relations  can  also  be  classified  semantically 
according  to  the  concept  they  represent  from  the  real-world  situation.  One  should  be  careful  to 
distinguish  between  the  semantic  and  the  structural  role  of  a relation  in  a data  model. 

Semantically,  we  distiguish  between  classes  of  entities,  properties  of  classes  of  entities,  and 
relationships  among  classes  of  entities.  Classes  of  entities  can  be  represented  by  several  structural 
relation  types,  depending  upon  their  relationship  with  other  classes  of  entities.  Hence,  entity  classes 
may  be  represented  by  either  primary  entity  relations,  referenced  entity  relations,  or  nest  relations, 
as  we  shall  see. 

Non-repeating  properties  of  a class  of  entities  are  represented  as  attributes  of  the  relation 
that  represents  the  entity  class.  Repeating  properties  of  a class  of  entities  are  represented  by  a 
nest  relation  owned  by  the  relation  that  represents  the  entity  class  (see  section  3.3.3). 

Relationships  among  entity  classes  can  also  be  represented  using  different  structures,  depend- 
ing upon  the  characteristics  of  the  relationship.  A relationship  between  two  entity  classes  may  be 
represented  by  an  ownership  connection,  a reference  connection,  or  two  connections  and  an  auxiliary 
relation.  This  auxiliary  relation  may  be  a primary  relation,  a nest  relation,  or  an  association 
relation  (see  section  4.1). 

Structurally,  relations  are  categorized  into  five  types:  primary  relations,  referenced  relations, 
nest  relations,  association  relations,  and  lexicon  relations.  These  are  all  relations  which  have  the 
same  form,  but  are  classified  according  to  their  connections  to  other  relations. 

In  this  section,  we  informally  present  the  rationale  behind  the  choice  of  the  different  structural 
relation  types.  We  give  formal  definitions  for  the  relation  types  in  section  3.4. 

A relation  in  the  data  model  which  represents  a class  of  entities  in  the  real-world  situation 
is  termed  an  entity  relation.  The  choice  of  entity  classes  is  a fundamental  aspect  of  the  data  model 
design  process.  The  goal  is  to  match  entity  relations  as  closely  as  possible  to  real-world  entity 
classes. 

Structurally,  entity  relations  may  be  primary,  referenced  or  nest  relations.  The  choice  of 
structural  type  to  represent  an  entity  relation  depends  upon  its  role  in  the  data  model.  In  the 
following  three  sections,  we  discuss  the  criteria  for  this  choice. 

3.3.1.  Primary  entity  relations: 

An  important  objective  of  the  data  model  is  to  represent  real-world  entities.  The  existence  of 
a tuple  in  the  data  model  which  represents  such  an  entity  is  hence  determined  by  the  existence  of 
the  actual  entity,  independently  from  other  modelling  considerations.  Classes  of  such  entities  are 
represented  in  the  data  model  by  primary  entity  relations.  Examples  of  primary  entity  relations 
are  EMPLOYEES  and  CARS.  Primary  entity  relations  should  be  chosen  to  be  update-independent 
of  other  relations  in  the  data  model.  An  update  of  another  relation  should  not  require  an  update 
of  a primary  entity  relation.  An  update  of  an  entity  relation,  however,  may  require  updates  to 
other  relations  connected  to  it,  as  we  shall  see  later. 

An  example  of  a primary  entity  relation  is  the  relation  EMPLOYEES  in  a model  that  rep- 
resents a company.  Updates  to  the  EMPLOYEES  relation  occur  only  from  outside  the  database. 
An  employee  tuple  is  inserted  whenever  a new  employee  is  hired  by  the  company,  and  deleted 
whenever  an  employee  leaves.  This  potentially  affects  several  other  relations  in  the  database  such 
as  CHILDREN  and  EMPLOYEES-DEPARTMENT.  Thus,  insertion  of  an  employee  tuple  involves 
the  possible  addition  of  tuples  to  other  relations  in  the  database  that  are  connected  to  the  employee 
relation,  such  as  tuples  that  represent  the  employee's  children  in  the  CHILDREN  relation,  and  tuples 
associating  the  employee  with  the  departments  he  works  for  in  the  EMPLOYEES-DEPARTMENT 


9 


relation.  Note  that  the  number  of  additional  tuple*  added  to  the  database  became  of  the  insertion 
of  a new  primary  entity  tuple  is  variable,  and  determined  externally;  the  model  only  presents  the 
user  with  guidelines  to  follow  when  inserting  a primary  entity  tuple. 

The  deletion  of  a tuple  from  a primary  entity  relation  may  imply  the  deletion  of  related 
tuples  from  other  relations  in  the  database  . Thus,  the  deletion  of  an  employee  tuple  will  involve 
the  deletion  of  tuples  for  his  children  from  the  CHILDREN  relation,  as  well  as  tuples  associating 
him  with  the  department  he  worked  in  from  the  EMPLOYEES-DEPARTMENT  relation.  Such 
a deletion  does  not  involve  any  additional  checking  before  the  tuple  is  deleted,  since  a primary 
entity  relation  may  not  be  referenced  by  any  other  relation  in  the  data  model. 

3.3.2.  Referenced  entity  relations: 

When  representing  a real-world  situation,  one  often  encounters  abstractions  that  are  used 
mainly  to  describe  properties  of  other  entities.  Such  entities  are  referenced  by  other  entities  in 
the  model.  This  type  of  entity  is  a referenced  entity,  and  classes  of  such  entities  are  represented 
in  the  data  model  by  referenced  entity  relations.  Examples  of  referenced  entity  relations  are 
CAR  MODEL  SPECIFICATIONS,  referenced  by  the  attribute  MODEL  in  the  relation  CARS, 
and  JOB  DESCRIPTION,  referenced  by  the  JOB  attribute  of  the  relation  EMPLOYEES.  The 
use  of  these  referenced  entities  greatly  reduces  redundancy  in  the  data  model.  As  we  shall  see,  the 
main  difference  between  a primary  entity  relation  and  a referenced  entity  relation  in  the  structural 
model  is  in  their  update  characteristics. 

A direct  reference  connection  will  exist  from  some  relations  in  the  data  model,  termed  the 
referencing  relations,  to  the  referenced  entity  relation.  The  reference  connection  restricts  the  dele- 
tion of  tuples  in  the  referenced  entity  relation,  as  well  as  the  insertion  of  tuples  in  the  referencing 
relations.  We  discuss  these  restrictions  here  in  terms  of  an  example,  and  will  define  them  precisely 
in  section  3.4. 

An  example  of  a referenced  entity  relation  is  presented  with  respect  to  a company  database. 
Suppose  the  company  wishes  to  keep  track  of  current  and  possible  supplier*  for  inventory  items. 
The  SUPPLIERS  relation  is  a referenced  entity  relation.  The  existence  of  supplier  tuples  is  deter- 
mined by  a selection  from  the  real-world,  since  the  company  maintains  a list  of  its  current  and 
possible  suppliers.  However,  a supplier  tuple  may  not  be  deleted  while  it  is  being  referenced  from 
the  INVENTORY  relation  within  the  data  model.  Thus,  the  deletion  of  tuples  from  a referenced 
entity  relation  requires  checking  the  tuples  in  all  relations  in  the  data  model  which  reference  this 
referenced  entity  relation.  Addition  of  tuples  to  the  referencing  relation,  the  INVENTORY  relation 
in  this  case,  is  restricted  to  those  tuples  that  reference  an  already  existing  supplier,  represented 
by  a tuple  in  the  SUPPLIERS  relation  in  the  database.  Thus,  the  name  of  a supplier  for  a new 
inventory  item  should  exist  in  the  SUPPLIERS  relation  before  the  new  referencing  tuple  is  added 
to  the  INVENTORY  relation. 

Tuples  of  referenced  entity  relations  may  be  referenced  from  more  than  one  relation.  For 
example,  the  SUPPLIERS  relation,  may  be  referenced  from  the  ACCOUNTS-PAYABLE  relation, 
describing  unpaid  bills,  as  well  as  from  the  INVENTORY  relation.  Note  that  supplier  tuples  may 
exist  which  are  not  currently  referenced  from  other  tuples  in  the  database,  but  one  cannot  delete  a 
supplier  tuple  without  checking  tuples  in  all  relations  the  c may  reference  the  SUPPLIERS  relation. 

All  other  update  characteristics  for  referenced  entity  relations  are  the  same  as  the  update 
characteristics  for  primary  entity  relations.  In  the  rest  of  this  paper,  when  we  U6e  the  term  entity 
relation  without  qualification,  we  will  mean  primary  or  referenced  entity  relation. 

3.3.3.  Nest  relations: 

Hierarchical  dependencies  occur  frequently  in  real-world  situations.  Hence,  real-world  entities 
will  be  represented  in  the  data  model  whose  existence  directly  depends  upon  the  existence  of 


10 


another  entity.  For  example,  in  a company  database,  the  CHILDREN  relation  represent*  children 
of  employees  currently  working  in  the  company.  The  existence  of  children  tuples  in  the  company 
database  is  justified  while  their  parent  works  for  the  company,  and  the  tuple  representing  the 
parent  exists  in  the  EMPLOYEES  relation.  Such  entities  will  be  represented  in  the  data  model  by 
nut  relations. 

A nest  relation  always  corresponds  to  a 1:N  relationship  between  two  data  model  relations, 
the  owner  relation  and  the  nest  relation.  In  our  example,  the  EMPLOYEES  relation  is  said  to  own 
the  CHILDREN  relation.  This  1:N  relationship  is  represented  in  the  data  model  by  an  ownership 
connection  from  the  owner  relation  to  the  nest  relation. 

For  each  tuple  in  the  owner  relation,  a set  of  zero  or  more  tuples  will  exist  in  the  nest  relation 
that  are  connected  to  this  tuple.  The  existence  of  this  set  of  tuples  depends  upon  the  existence  of 
the  owner  tuple  in  the  owner  relation.  The  term  ’nest  relation’  has  been  chosen  because  each  owner 
tuple  will  own  a 'nest'  of  tuples  in  the  nest  relation.  The  existence  of  individual  tuples  of  the  nest 
is  determined  by  the  real-world  requirements. 

Hierarchical  dependencies  also  occur  when  a class  of  entities  has  a repeating  property,  where 
the  number  of  repetitions  is  variable  for  each  entity  in  the  class.  We  then  represent  the  repeating 
properties  by  attributes  in  a nest  relation  that  is  owned  by  the  relation  representing  the  entity 
class.  An  example  is  the  education  history  attributes  of  an  employee  in  the  company  database. 
Here,  the  EMPLOYEES  relation  owns  the  nest  relation  EDUCATION  HISTORY.  In  the  structural 
model,  the  normalization  to  first  normal  form  forces  the  use  of  distinct  nest  relations,  but  the 
connection  to  the  owner  relation  remains  recognized. 

Insertion  of  a tuple  in  a nest  relation  is  contingent  upon  the  existence  of  the  owner  tuple 
in  the  owner  relation.  Thus,  one  may  not  insert  a child  or  an  education  history  tuple  without  a 
corresponding  owner  employee  tuple  in  the  EMPLOYEES  relation.  The  deletion  of  a tuple  from 
a nest  relation  is  not  restricted  by  the  ownership  connection.  The  deletion  of  a tuple  from  the 
owner  relation  requires  deletion  of  the  nest  of  tuples  owned  by  it  in  the  nest  relation.  Insertion  of 
tuples  in  the  owner  relation  may  involve  the  creation  and  insertion  of  a nest  of  tuples  in  the  nest 
relation. 

3.3.4.  Lexicon  Relations: 

A lexicon  relation  is  used  to  represent  a one-to-one  correspondence  between  two  sets  of  at- 
tributes. Most  frequently,  the  one-to-one  correspondence  will  be  between  only  two  single  attributes, 
but  sets  of  attributes  may  also  be  involved.  Examples  are  the  one-to-one  correspondence  between 
the  two  attributes  DEPARTMENT-NAME  and  DEPARTMENT-NUMBER  in  a company  data 
model,  or  that  between  the  two  sets  of  attributes  {INSTRUCTOR,  CLASS,  SECTION } and 
{ROOM,  HOUR,  DAYS)  in  a university  data  model.  This  one-to-one  correspondence  reflects  a 
similar  correspondence  between  properties. 

Such  one-to-one  correspondences  between  two  sets  of  attributes  occur  frequently,  and  isolat- 
ing lexicons  simplifies  the  data  model  considerably  by  transferring  attributes  that  serve  the  same 
function  into  a lexicon  relation.  One  set  of  attributes  can  represent  all  instances  of  either  set 
outside  of  the  lexicon  itself.  Which  set  of  attributes  remains  in  the  core  of  the  data  model  is  left 
to  the  judgment  of  the  model  designer. 

The  lexicon  relation  will  have  a reference  connection  to  it  from  every  relation  in  the  data 
model  that  includes  either  one  or  both  of  the  sets  of  attributes  in  the  lexicon.  The  reference 
connection  may  be  a direct  reference  or  an  identity  reference,  depending  on  the  situation. 

Lexicons  serve  another  important  function  in  the  data  model.  Frequently,  relations  will  have 
more  than  one  set  of  ruling  (or  key)  attributes.  A set  of  ruling  attributes  is  guaranteed  to  have 
a unique  value  for  any  tuple  in  the  relation,  and  thus  any  such  set  of  ruling  attributes  may  be 
used  for  tuple  identification.  In  our  model,  each  relation  has  one  primary  set  of  ruling  attributes, 
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the  ruling  part  of  the  relation.  Other  equivalent  lets  of  ruling  attribute*  are  transferred  to  lexicon 
relation*. 

The  u*e  of  lexicon*  can  greatly  reduce  the  number  of  possible  alternatives  for  the  data  model, 
leading  to  a significant  simplification  of  the  model  design  process.  The  two  sets  of  attributes  in  a 
lexicon  relation  can  be  treated  conceptually  as  a single  attribute  in  intermediate  processes  which 
lead  to  the  design  of  the  data  model,  and  can  thus  be  considered  as  equivalent  in  the  data  model. 
Hence,  lexicon  relations  can  be  seen  as  a means  of  reducing  the  number  of  attributes  in  the  core 
of  the  data  model,  leading  to  the  creation  of  a clearer,  simpler  model. 

3.3.5.  Association  relations: 

We  finally  consider  relations  used  to  represent  the  interaction  between  two  or  more  relations 
in  the  data  model.  Such  relations  will  be  termed  association  rdaUom.  An  association  relation  between 
two  relations  associates  with  each  tuple  of  one  relation  a number  of  tuples  from  the  other  relation 
(possibly  none).  It  does  not  represent  any  existence  dependency  between  the  tuples  in  the  different 
relations,  but  only  an  association  between  existing  tuples. 

An  association  relation  of  order  i relates  tuples  from  i owner  relations.  Each  of  the  owner 
relations  has  an  ownership  connection  to  the  association  relation. 

An  example  of  an  association  of  order  2 is  the  relation  EMPLOYEE-PROJECT  which  relates 
an  employee  to  the  project*  he  works  in,  and  vice-versa.  Each  project  tuple  and  each  employee 
tuple  have  an  existence  of  their  own,  independently  from  the  tuples  in  the  association  relation.  A 
tuple  in  the  association  only  relates  an  employee  with  a project. 

An  example  of  an  association  of  order  3 is  the  SUPPLIER-PART-PROJECT  relation,  which 
relates  tuples  from  three  owner  relations. 

An  association  relation  is  used  to  represent  information  relevant  to  a relationship  between 
entity  classes.  Usually,  the  entity  classes  are  represented  by  the  i independent  relations.  Thus,  in 
our  example,  the  EMPLOYEE-PROJECT  association  may  include  information  about  the  job  the 
employee  does  for  the  project,  the  percentage  of  time  he  works  on  the  project,  . . . etc.  It  is  also 
possible  for  association  relations  to  have  no  dependent  information.  In  this  case  the  association 
relation  is  used  only  for  relating  tuples  from  the  owner  relations  together. 

The  update  rule*  for  an  association  relation  and  its  owner  relations  are  now  self-evident:  no 
tuple  in  the  association  relation  may  be  created  if  there  are  no  corresponding  owner  tuples  in  the 
owner  relations,  and  deletion  of  a tuple  from  any  owner  relation  causes  the  deletion  of  all  tuples 
affiliated  with  it  from  the  association.  Note  that  the  deletion  rule  does  not  affect  the  existence  of 
the  tuples  related  to  the  deleted  tuple  in  the  other  owner  relations:  it  only  affects  those  tuples  in 
the  association  relation  that  serve  to  relate  these  tuples  together.  Thus,  deletion  of  an  employee 
would  not  affect  the  existence  of  any  of  the  projects  he  works  for. 

3.4.  Formal  definition  of  relation  types: 

In  this  section,  we  formally  define  the  different  types  of  relations  discussed  in  section  3.3  in 
terms  of  their  connections  with  other  relation  types  in  the  data  model.  We  then  define  subrelations 
of  existing  relations,  and  how  a subrelation  is  connected  to  its  base  relation  in  section  3.4.2. 

For  the  remainder  of  the  paper,  we  will  use  the  term  relation  for  both  the  relation  schema 
and  the  relation,  since  the  meaning  is  dear  from  the  context. 

3.4.1.  Basic  relation  types: 

Semantically,  relations  are  classified  into  entity  and  non-entity  relations. 
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Figure  2.  A nest  relation,  Rj 


Definition  7:  An  entity  relation  is  a relation  R[X]  which  defines  a correspondence  between 
members  of  a class  of  real* world  entities  and  the  tuples  in  R[X]. 

The  ruling  part  of  an  entity  relation  defines  the  correspondence  to  the  class  of  real-world 
entities,  while  the  dependent  part  includes  the  attributes  that  describe  basic  properties  of  the 
entities. 

Structurally,  we  define  five  basic  types  of  relations: 

Definition  8:  A primary  relation  is  a relation  that  has  no  direct  references  or  ownership  con- 
nections to  it  from  any  other  relation  in  the  data  model. 

Primary  relations  are  required  to  have  no  references  or  ownership  connections  to  them.  Thus, 
deletion  of  tuples  from  primary  relations  is  unconstrained  by  the  data  model. 

Definition  9:  A referenced  relation  is  a relation  which  has  direct  references  to  it  from  some 
relations  in  the  data  model. 

The  ruling  part  attributes  K(R)  of  a referenced  relation,  R,  are  used  for  referencing  R from 
other  relations.  Hence,  each  relation  R'  that  references  R will  have  a set  of  referencing  attributes 
that  define  the  reference  connection  to  R.  This  constrains  insertion  and  deletion  of  tuples  in  both 
R and  R'. 

Insertion  of  a tuple  in  R should  precede  any  reference  to  it  from  a tuple  in  a referencing 
relation.  Deletion  of  a tuple  from  R involves  checking  that  it  is  not  referenced  by  any  tuples  from 
any  of  the  relations  that  reference  R.  Insertion  of  a tuple  in  R’  requires  the  existence  of  all  tuples 
that  it  references. 

Definition  10:  A nest  relation  is  a relation,  Rg,  which  has  an  ownership  connection  to  it  from 
exactly  one  other  relation,  Ri,  in  the  data  model.  Ri  is  the  oumer  of  Rg. 

A nest  relation  Rg  has  an  ownership  connection  to  it  from  the  owner  relation,  Ri.  Hence,  the 
ruling  part  K(Rg)  will  consist  of  two  parts:  a set  of  attributes  to  define  the  connection  with  Ri, 
and  additional  attribute(s)  which  must  uniquely  identify  tuples  owned  by  the  same  owner  tuple 
in  Rj. 

Insertion  of  tuples  in  Rg  requires  the  existence  of  the  owner  tuple  in  Ri.  Deletion  of  tuples 
from  the  nest  relation  may  occur  based  on  conditions  determined  externally  from  the  database, 
but  may  also  be  the  result  of  deleting  an  owner  tuple  from  Ri,  which  requires  deletion  of  all  tuples 
owned  by  it  in  Rg. 
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Figure  3.  An  association  relation,  R,  of  order  2 


Figure  4.  A lexicon  relation,  R[X] 


Definition  11:  An  association  relation  R of  order  »,  t > 1,  is  a relation  R that  hae  i ownership 
connections  to  it  from  t other  relations  in  the  data  model,  Ry, . . .R<  such  that: 

a.  each  Ry  has  an  ownership  connection  to  R through  Xy,  Yy  for  .t. 

b.  Yyf|Y*-0for  j^k. 

c.  K(R)  “ Yi  (J . . . (J  Y,’. 

An  association  relation  of  order  i has  i ownership  connections  to  it,  one  from  each  of  the 
i owner  relations.  Hence,  the  domain  of  the  ruling  part  attributes  of  an  association  relation  is  a 
catenation  of  i sets  of  attributes,  each  set  defining  the  connection  to  one  of  the  owner  relations. 
A tuple  in  the  association  is  owned  by  one  tuple  from  each  of  the  owner  relations.  For  each  tuple 
in  an  owner  relation,  there  may  exist  zero,  one  or  many  owned  tuples  in  the  association. 

Deleting  a tuple  from  an  owner  relation  will  thus  require  the  deletion  of  all  tuples  owned  by 
it  in  the  association.  Insertion  of  a tuple  in  the  association  will  require  the  existence  of  the  i owner 
tuples. 

Definition  12:  A lexicon  relation  R[X]  between  two  sets  of  attributes  Yi  and  Yg  defines  a 1:1 
correspondence  between  DOM(Yj)  and  DOM(Yj)  such  that: 

a.  Yi  - K(X). 

b.  the  set  of  attributes  Y%  does  not  appear  in  any  relation  other  than  R. 

c.  Yx  p|  Yt  - 0,  and  Yi  Q Y2  - X. 

d.  R is  referenced  by  one  or  more  relations  in  the  data  model  by  identity  or  direct 
references. 
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A lexicon  will  have  reference  connection*  to  it  from  all  the  relation*  in  the  data  model  that 
contain  the  *et  of  attribute*  in  the  lexicon.  The  ruling  part  of  a lexicon  i*  the  attribute  set  that  exist* 
in  the  other  relation*  in  the  model,  and  the  dependent  part  ic  the  other  attribute  cet  in  the  lexicon. 

For  example,  if  it  ic  necessary  to  identify  the  dapartment  in  several  relation*  of  the  data  model, 
then  either  DEPARTMENT-NUMBER  or  DEPARTMENT-NAME  would  be  chosen.  To  simplify 
the  model,  an  arbitrary  single  choice  is  made,  say  to  use  the  attribute  DEPARTMENT-NUMBER 
in  all  relation*  of  the  model.  Then,  DEPARTMENT-NUMBER  will  be  the  ruling  part  of  the 
lexicon,  and  DEPARTMENT-NAME  will  be  the  dependent  part.  Every  relation  containing  the 
attribute  DEPARTMENT-NUMBER  will  reference  the  lexicon. 

The  above  definition*  define  the  five  structural  types  of  relations:  primary,  referenced,  nest, 
association,  and  lexicon.  Connections  can  exist  at  any  level  in  the  model:  nest  relations  can  be 
owned  by  other  nest  relations,  by  associations,  or  by  referenced  entity  relations  as  well  as  by 
primary  entity  relations.  Similar  choices  exist  for  referenced  relations,  associations,  and  lexicons. 

A subrelation  may  be  defined  on  any  relation.  In  the  following  section,  we  define  subrelations.  / 

3.4.2.  Subrelations: 

A subrelation  S of  some  relation  R defines  a subset  of  the  tuples  in  R as  belonging  to  the 
subrelation.  This  subset  of  tuples  either  has  a semantic  significance  in  the  data  model,  or  has 
certain  additional  properties  that  have  to  be  represented,  but  that  are  not  represented  in  the  other 
tuples  in  R.  The  relation  R is  called  the  base  rdation  of  the  subrelation  S. 

We  will  not  allow  duplication  of  information  in  the  representation  of  a subrelation,  other 
than  the  information  needed  for  tuple  identification.  Hence,  a subrelation  will  have  the  same  ruling 
part  attributes  as  the  base  relation,  and  will  be  connected  to  the  base  relation  through  an  identity 
reference  connection.  The  identity  reference  reflects  the  fact  that  a tuple  in  the  subrelation  that 
has  the  same  value  for  the  ruling  part  as  a tuple  in  the  base  relation  represents  the  same  entity 
in  the  data  model. 

All  attributes  other  than  the  ruling  part  attributes  of  the  subrelation  have  to  be  different 
from  the  attributes  of  the  base  relation. 

Definition  13:  A (non-restriction)  svbrdation  of  relation  R[X]  is  a relation  S[Z]  such  that: 

a.  an  identity  reference  exists  from  S to  R. 

b.  for  every  tuple  * in  S,  there  exists  a corresponding  tuple  x in  R such  that  k(x)  ■= 

k(*). 

c.  Z-K(2)f]X-K(X)>=0. 

The  relation  R is  called  the  base  rdation  for  subrelation  S. 

Definition  13a: A restriction  subrelation  of  a relation  R[X],  restricting  the  set  of  attributes  Y, 

Y C X,  to  the  subdomain  D,  D £ DOM(Y),  is  a subrelation  S(Z]  of  R such  that:  for 
every  tuple  x in  R that  has  as  value  for  the  set  of  attributes  Y a tuple  y in  D,  there 
exists  a corresponding  tuple  s in  S such  that  Jc(s)  — k(x). 

An  example  of  a restriction  subrelation  is  a relation  TECHNICAL  EMPLOYEES,  a subrela- 
tion of  the  EMPLOYEES  relation,  restricting  the  attribute  JOB  of  EMPLOYEES  to  the  subdomain 
{engineer,  researcher,  technician},  say.  j 

Existence  of  tuples  in  a restriction  subrelation  is  totally  dependent  on  the  existing  tuples  in  its 
base  relation.  In  our  example,  all  employee  tuples  with  job  value  engineer,  researcher  or  technician 
must  also  exist  in  the  TECHNICAL  EMPLOYEES  subrelation,  while  all  other  employee  tuples 
cannot  exist  in  this  subrelation. 

An  example  of  a non-restriction  subrelation  is  a relation  EMPLOYEES  IN  SPECIAL  PROJECT 
X.  Existence  of  tuples  in  this  subrelation  is  determined  externally  of  the  data  model,  but  confined 
to  tuples  in  the  base  relation  of  all  employees. 
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We  will  use  subrelations  to  represent  three  cases: 

(1)  When  a subset  of  a relation  has  a semantic  significance  within  the  data  model,  or  has 
additional  attributes  that  need  to  be  represented  in  the  model. 

(2)  When  integrity  constraints  require  a subset  of  a relation  to  own  a nest  relation  or  an 
association,  or  to  be  referenced  from  another  relation. 

(3)  When  we  combine  data  models  to  form  an  integrated  database  model  (see  section  5),  some 
data  models  may  represent  subsets  of  relations  represented  in  other  data  models.  This  has 
to  be  reflected  in  the  integrated  database  model. 

The  update  rules  for  the  base  relation  and  the  subrelation  are:  when  a tuple  that  belongs 
to  the  subset  represented  by  the  subrelation  is  inserted  in  (deleted  from)  the  base  relation,  the 
corresponding  tuple  (having  the  same  ruling  part  value)  is  inserted  in  (deleted  from)  the  subrela- 
tion. Also,  if  an  update  to  a tuple  in  the  base  relation  results  in  the  removal  of  the  tuple  from 
the  subset,  the  corresponding  tuple  should  be  deleted  from  the  subrelation.  For  example,  if  the 
job  of  an  employee  tuple  is  changed  from  engineer  to  manager  the  corresponding  tuple  in  the 

TECHNICAL  EMPLOYEES  subrelation  should  be  deleted.  J 

3.5.  Maintaining  the  structural  integrity  of  the  data  model: 

Structural  integrity  exists  in  our  model  when  the  tuples  in  the  data  model  do  not  violate  the 
constraints  specified  by  the  connections  between  relations.  One  can  consider  that  the  structural 
model  contains  a basic  set  of  integrity  assertions  as  part  of  the  model.  The  integrity  assertions 
are  those  expressed  implicitly  by  the  connections  between  relations,  and  are  used  to  specify  the 
existence  dependencies,  and  hence  the  update  constraints,  of  tuples  in  connected  relations. 

We  do  not  specify  in  the  model  when  or  how  the  integrity  constraints  are  to  be  maintained  in 
an  implementation  of  the  data  model.  The  purpose  of  the  model  is  that  integrity  constraints  can 
be  recognized,  and  that  implementors  can  refer  for  guidance  to  the  model.  In  practical  implemen- 
tations, there  may  be  intervals  where  the  structural  integrity  rules  do  not  hold.  It  should  be  known 
however  which  structural  integrity  constraints  have  been  violated  and  are  awaiting  correction. 

Hierarchical  and  network  databases  tend  to  require  that  all  integrity  constraints  be  satisfied  for 
those  connections  that  are  actually  implemented.  Techniques  dealing  with  temporary  integrity 
violations  using  artificial  reference  tuples  are  indicated  in  [Wiederhold77]. 

Our  model  may  appear  less  powerful  than  the  original  relational  model  since  update  integrity 
violations  can  occur.  In  the  pure  relational  model,  inter-relation  connections  are  not  described,  but 
are  left  to  be  discovered  at  query- processing  time.  The  lack  of  recognition  of  logical  connections 
between  relations  in  a database  model  will  simplify  certain  technical  problems  during  update, 
but  does  not  eliminate  semantic  inconsistencies  relative  to  knowledge  models  of  the  database  ad- 
ministrator or  the  user.  Furthermore  in  many  situations  it  is  best  to  discover  and  correct  integrity 
violations  at  the  time  of  update  rather  than  to  try  and  cope  with  an  inconsistent  database  at  query 
processing  time. 

In  section  3.5.1,  we  list  the  integrity  constraints  specified  by  each  connection  type,  then  give  a 
summary  of  rules  for  maintenance  of  the  structural  integrity  for  each  of  the  relation  types.  We  then 
show  in  section  3.5.2  how  these  rules  may  be  expressed  as  simple  algorithms  for  maintaining  the 
structural  integrity  of  the  database  upon  insertion  and  deletion  of  tuples,  and  update  of  attribute 
values. 

3.5.1.  Update  constraints  in  the  structural  model: 

The  integrity  constraints  specified  by  the  connection  types  are  the  following: 

A direct  reference  connection  from  relation  to  relation  Rj  specifies  the  constraints: 

(1)  Every  tuple  in  Rj  must  reference  an  existing  tuple  in  Rj. 
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(2)  Deletion  ia  reatricted  for  tuplea  in  Rj.  Only  tuplea  that  are  not  referenced  from  any  relation 
in  the  data  model  may  be  deleted. 

An  ownenhip  connection  from  relation  Ri  to  relation  Rj  apecifiea  the  conatrainta: 

(1)  Every  tuple  in  Rj  must  be  owned  by  an  existing  tuple  in  R*. 

(2)  Deletion  of  a tuple  from  Ri  requires  deletion  of  all  owned  tuples  in  Ra. 

An  identity  reference  connection  from  a subrelation  Rj  to  its  base  relation  R specifies  the  con 
atrainta: 

(1)  Every  tuple  in  Rx  must  reference  an  existing  tuple  in  R. 

(2)  Deletion  of  a tuple  from  R requires  deletion  of  the  referencing  tuple  in  Rj. 

(3)  If  Rx  ia  a restriction  subrelation,  then  every  tuple  in  R that  belongs  to  the  subrelation 
(specified  by  the  value  of  the  restricting  attributes  in  R)  must  exist  in  Rx. 

We  now  give  an  informal  listing  of  the  update  constraints  associated  with  each  relation  type 

1.  Primary  relation: 

(a)  The  tuples  are  neither  owned  nor  referenced  by  other  tuples  in  the  data  model. 

(b)  Deletion  of  a tuple  requires  the  deletion  of  tuples  owned  by  it  in  nest  and  association 
relations. 

(c)  Insertion  of  a tuple  requires  the  existence  of  referenced  tuples  in  the  relations  referenced 
by  attribute  values  in  the  new  tuple. 

2.  Referenced  relation: 

(a)  The  tuples  are  referenced  from  other  tuples  in  the  data  model. 

(b)  The  ruling  part  defines  the  attributes  through  which  the  tuples  are  referenced  by  other 
tuples  in  the  data  model. 

(c)  Deletion  of  a tuple  is  constrained  by  the  existence  of  references  to  that  tuple.  Also,  as  in 
1(b) 

(d)  As  in  1(c) 

3.  Nest  relation; 

(a)  The  tuples  may  be  referenced  from  other  tuples  in  the  data  model. 

(b)  The  ruling  part  defines  a specific  owner  tuple,  and  a specific  tuple  within  the  nest  of  tuples 
that  has  the  same  owner  tuple. 

(c)  As  in  1(b).  If  the  relation  is  referenced,  deletion  is  constrained  by  existence  of  references 
to  the  tuple. 

(d)  Insertion  of  a tuple  requires  the  existence  of  the  owner  tuple  in  the  owner  relation,  and  the 
existence  of  referenced  tuples  in  relations  referenced  by  it. 

4.  Lexicon  relation: 

(a)  As  in  2.a. 

(b)  The  ruling  part  is  a set  of  attributes,  through  which  the  tuple  is  referenced. 

(c)  Deletion  of  tuples  is  constrained  by  the  existence  of  references  to  that  tuple. 

(d)  Insertion  of  a tuple  requires  no  checking. 

5.  Association  relation  of  order  i : 

(a)  As  in  3.a. 

(b)  The  ruling  part  defines  i specific  owner  tuples,  one  from  each  of  the  s owner  relations. 
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(c)  At  in  3.c.  i 

(d)  Insertion  of  a tuple  requires  the  existence  of  the  t owner  tuples  in  the  t owner  relation,  and  j 

the  existence  of  referenced  tuples  in  relations  referenced  by  it.  1 

6.  Subrelation: 

(a)  As  in  3. a. 

(b)  The  ruling  part  attributes  are  used  for  referencing  the  base  relation  through  an  identity 
reference. 

(c)  As  in  3.c. 

(d)  Insertion  and  deletion  of  tuples  in  a restriction  subrelation  are  totally  controlled  by  existing 

tuples  in  the  base  relation.  | 

As  indicated  earlier,  a relation  may  have  more  than  one  connection  with  other  relations  in  , 

the  data  model.  A nest  relation  may  for  instance  itself  be  referenced,  and  may  also  reference  tuples 

of  another  referenced  entity  relation.  In  these  cases,  all  connections  impose  constraints  on  the  data  / 

model. 

3.5.2.  Data  model  update  algorithms: 

We  now  give  three  simple  algorithms  for  maintaining  the  structural  integrity  of  the  data 
model  by  observing  the  constraints  given  in  the  preceding  section.  The  algorithms  will  be  described 
in  terms  of  the  connection  types  defined  in  section  3.2.2. 

3. 5. 2.1.  Tuple  insertion  algorithm: 

Upon  receipt  of  a request  to  insert  a new  tuple  X in  relation  R: 

a.  Check  she  consistency  of  the  new  tuple  with  the  current  tuples  in  the  database: 
a.l.  For  every  relation  Ri  referenced  by  R through  a reference  connection,  verify  that 

the  tuple  y referenced  by  x exists  in  Rj. 

a. 2.  For  every  relation  Ri  that  has  an  ownership  connection  to  R,  verify  that  the  owner 
tuple  y of  x exists  in  Ri. 

b.  If  the  new  tuple  is  consistent  with  the  data  model,  insert  it  and  for  every  relation  Rj  owned 
by  R through  an  ownership  connection,  send  a message  to  the  user  reminding  him  to  insert 
the  tuples  owned  by  x in  R2. 

Thus  insertion  involves  two  actions:  checking  that  tuples  connected  with  the  new  tuple  exist 
in  the  data  model,  and  insertion  of  other  tuples  connected  with  the  new  tuple.  The  checking  can 
be  done  automatically,  but  insertion  of  other  new  tuples  will  in  most  cases  be  done  by  the  user. 

For  example,  the  insertion  of  an  employee  tuple  involves  insertion  of  his  children  in  a nest  relation 
CHILDREN  owned  by  the  EMPLOYEES  relation,  and  of  the  tuples  associating  the  employee  with 
the  department  he  works  for  in  the  EMPLOYEE-DEPARTMENT  association  relation,  also  owned 
by  EMPLOYEES.  However,  any  new  tuples  in  both  CHILDREN  and  EMPLOYEE-DEPARTMENT 
are  inserted  by  the  user.  The  system  only  reminds  the  user  that  such  data  may  exist,  and  if  they 
do  exist  they  should  be  added  to  the  data  model. 

In  some  cases,  as  when  a nest  relation  represents  repeating  properties  of  an  entity  class,  an 
application  program  can  be  written  to  insert  all  properties  of  the  entity  simultaneously.  Both  a ' 

tuple  in  the  entity  relation,  and  its  nest  of  tuples  that  represent  the  repeating  property  are  inserted. 

3. 5.2.2.  Tuple  deletion  algorithm: 

Upon  receipt  of  a request  to  delete  tuple  x in  relation  R: 
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a.  Check  for  direct  reference*  to  x from  other  tuple*  in  the  data  model:  If  relation  Rii* 
referenced  relation  or  a lexicon,  check  that  xit  not  referenced  by  any  tuple  from  a relation 
with  a direct  reference  to  R.  If  x i*  referenced,  lend  an  error  menage,  and  do  not  complete 
the  deletion. 

b.  Check  if  tuple*  owned  by  x may  be  deleted:  For  every  relation  Ri  owned  by  R,  initiate 
deletion  of  the  tuples  in  Rj  owned  by  x.  For  every  lubrelation  Rj  of  R,  initiate  deletion 
of  the  tuple  y in  the  lubrelation  that  corresponds  to  x. 

c.  If  all  the  owned  and  subrelation  tuples  can  be  deleted,  complete  deletion  of  x.  Otherwise, 
do  not  complete  deletion  of  x,  and  tend  a warning  message  that  x could  not  be  deleted. 

Deletion  alio  coniiiti  of  two  parti:  checking  that  the  tuple  being  deleted  ii  not  referenced, 
and  deleting  tuples  owned  by  the  tuple  being  deleted.  The  algorithm  ii  recursively  applied. 

3. 5.2. 3.  Attribute  update  algorithm: 

Upon  receipt  of  a request  to  update  attribute  A of  tuple  x,  which  belongs  to  relation  R: 

a.  If  A is  neither  an  attribute  through  which  R references  other  relations,  nor  a member  of 
the  ruling  part  of  R,  perform  the  update. 

b.  Update  of  connection  attributes: 

b.l.  Referencing  attributes:  If  A is  an  attribute  through  which  R references  a relation 
Rl,  check  that  the  new  value  will  reference  an  existing  tuple  in  Rj.  If  the  new 
value  references  a non-existing  tuple  in  Rj,  do  not  complete  the  update  and  send 
an  error  message. 

b.2.  Ruling  part  attributes:  If  A is  a member  of  the  ruling  part  of  R,  initiate  deletion  of 

x using  the  deletion  algorithm.  If  deletion  is  completed,  insert  the  updated  tuple 

xi  with  the  new  value  for  A using  the  insert  algorithm.  Otherwise,  send  an  error 
message. 


4.  REPRESENTATION  OF  DATA  MODELS 


We  now  present  the  guidelines  that  the  structural  model  presents  to  a daf  model  designer, 
and  discuss  how  a choice  is  made  between  the  different  representation  forms  provi«_jd  by  the  struc- 
tural model  to  represent  a particular  situation.  We  will  see  that  the  same  data  can  be  represented 
with  different  relationships,  according  to  the  situation,  or  the  view  of  the  data  model  designer. 
Eventually  such  differences  can  be  accomodated  in  the  integrated  database  model. 

We  use  the  following  notation  to  represent  connections  in  our  diagrams: 

A 

— > £ 

Ownership  connection  Direct  reference  Identity  reference 

41.  Representation  of  relationships  in  the  structural  model: 

One  of  the  advantages  of  the  structural  model  is  that  it  guides  the  choice  of  representation  for 
a particular  situation.  This  is  because  the  rules  attached  to  each  relation  and  connection  type  are 
explicit,  and  will  lead  the  data  model  designer  to  carefully  consider  the  situation  he  is  modelling. 
A model  relevant  to  the  real-world  situation  will  be  the  result,  and  the  situation  will  be  clearly 
represented. 

In  the  ensuing  discussion,  we  use  the  term  relationship  to  denote  a relationship  between  two 
real-world  entity  classes,  and  the  term  connection  to  denote  a connection  between  two  relations 
in  a data  model. 

Consider  the  relationship  between  two  entity  desses,  FATHERS  and  CHILDREN.  This  is  a 
1:N  relationship,  and  may  be  represented  using  several  different  constructs  in  the  structural  model 
(figure  5): 

a.  As  an  association  between  two  entity  relations  representing  fathers  and  children. 

b.  As  a direct  reference,  from  an  entity  relation  representing  children,  to  a referenced  entity 
relation  representing  fathers. 

c.  As  an  ownership  connection,  from  an  entity  relation  representing  fathers,  to  a nest  entity 
relation  representing  children. 

The  choice  among  these  alternatives  depends  upon  the  situation  being  modelled. 

First,  consider  the  case  where  the  data  model  represents  a community  of  people.  Each  person 
in  the  community  has  an  identity  of  his  own,  and  we  want  to  represent  the  father-child  relationship 
between  two  persons  in  the  community.  In  this  case,  the  appropriate  representation  would  be  as 
an  association  between  two  persons,  the  FATHER-CHILD  association  relation  (figure  5a).  H either 
the  father  or  his  offspring  move  from  the  community,  there  is  no  further  need  for  a father-child 
connection  between  two  persons  in  the  community.  This  is  well  represented  in  the  data  model  by 
the  association,  since  deletion  of  a father  (or  child)  tuple  causes  the  deletion  of  the  associating 
tuple,  but  leaves  the  tuple  representing  the  other  person  unaffected. 

On  the  other  hand,  suppose  the  data  model  represents  data  from  a school  system.  In  this 
case,  the  father-child  relationship  is  best  represented  by  a reference  connection  from  a CHILDREN 
relation  to  a FATHERS  relation  (figure  5b).  This  restricts  the  deletion  of  a father  tuple  as  long 
as  it  is  being  referenced  by  a child  tuple.  Again,  this  is  a faithful  representation  of  the  situation 
since  we  want  to  keep  information  on  the  father  as  long  as  he  has  a child  in  the  school.  Also,  every 
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(a)  Association  (b)  Reference  (c)  Nest 

Figure  5.  Some  representations  of  the  FATHER.CHILD  relationship 


(a)  Association  (b)  Nest  and  association 

Figure  6.  Some  representations  of  the  SUPPLIERS:PARTS:PROJECTS  relationship 


child  in  this  school  must  have  some  information  about  his  father.  (If  the  father  is  unknown,  an 
"unknown  father”  tuple  could  be  placed  within  the  FATHERS  relation.) 

Finally,  if  the  data  model  represents  data  from  a company,  and  a child  is  represented  in  the 
data  model  only  because  his  father  works  for  the  company,  then  the  relationship  is  best  represented 
as  a nest  relation  CHILDREN  owned  by  the  FATHERS  relation  (figure  5c).  (In  this  case,  FATHERS 
could  be  a subrelation  of  the  EMPLOYEES  relation.)  Then,  children  are  automatically  deleted 
from  the  data  model  once  their  father  is  deleted.  Here,  when  an  employee  is  fired  (and  the  decision 
is  made  to  remove  his  representation  from  the  active  employees  file),  the  company  is  not  interested 
in  any  information  about  his  children. 


Let  is  consider  a second  example,  that  of  an  inventory  allocation.  The  situation  being  rep- 
resented is  the  association  between  suppliers,  parts  and  projects.  If  each  of  the  three  entity  da66es 
has  an  independent  existence  of  its  own,  the  appropriate  representation  is  an  association  among 
three  entity  relations  SUPPLIERS,  PARTS  and  PROJECTS  (figure  Oa). 


Alternatively,  suppose  that  we  want  to  associate  with  each  supplier  the  parts  that  he  supplies, 
so  that  a part  does  not  have  an  independent  existence,  but  depends  on  the  supplier  that  supplies  the 
part.  Then,  the  situation  is  best  represented  by  two  entity  relations,  SUPPLIERS  and  PROJECTS, 
a nest  entity  relation  PARTS  owned  by  the  SUPPLIERS  relation,  and  an  association  relation 
PARTS-PROJECTS  between  PARTS  and  PROJECTS  (figure  8b).  Note  that  this  represents  the 
full  association  of  SUPPLIER:PART:PROJECT,  since  by  the  definition  of  a nest  relation,  the 
ruling  part  of  the  nest  relation  PARTS  includes  the  ruling  part  of  the  SUPPLIERS  relation  (see 
section  3.4.1). 

These  two  examples  show  how  the  update  rules  associated  with  each  relation  type  are  used 
for  guidance  when  designing  a data  model.  The  update  rules  force  the  data  model  designer  to 
carefully  consider  the  characteristics  of  the  situation  that  he  is  modelling,  and  thus  the  data  model 
becomes  a faithful  representation  of  the  situation. 
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42.  Representation  of  a relationship  between  two  entity  classes: 

In  this  section,  we  consider  all  possible  ways  in  which  the  structural  model  can  represent 
a relationship  between  two  entity  classes.  This  is  important  for  identifying  the  constraints  on 
relationships.  It  is  also  important  when  we  discuss  data  model  integration  in  section  5. 

Consider  two  entity  classes,  A and  B,  related  in  some  way.  One  characteristic  of  the  relation- 
ship is  its  cardinality.  The  cardinality  of  the  relationship  restricts  the  number  of  entities  of  one  class 
that  may  be  related  to  an  entity  of  the  other  class.  The  cardinality  of  the  relationship  between  A 
and  B may  be: 

L (a)  1:1,  an  entity  in  A may  be  related  to  at  most  one  entity  in  B,  and  vice  versa. 

(b)  1:7V,  an  entity  in  A may  be  related  to  TV  entities  in  B , TV  ^ 0,  but  an  entity  in  B may  be 
related  to  at  most  one  entity  in  A. 

(c)  M:N , an  entity  in  A may  be  related  to  TV  entities  in  B , /V  ^ 0,  and  an  entity  in  B may 
be  related  to  M entities  in  A,  M ^ 0. 

Cardinalities  may  be  further  constrained  by  specifying  M and  N as  constant  numbers.  For 
example,  a 1:1  relationship  is  a constrained  1 :N  relationship  with  TV  set  to  1. 

The  second  characteristic  of  relationships  is  the  dependency.  The  dependency  specifies  whether 
an  entity  of  one  class  can  exist  independently,  or  whether  it  must  be  related  to  an  existing  entity 
of  the  other  class.  Dependencies  can  be  classified  into  three  types; 

(a)  A total  dependency  specifies  that  entities  in  both  classes  must  be  related  to  a specified  number 
of  entities  of  the  other  class  at  all  times. 

(b)  A partial  dependency  specifies  that  entities  from  one  class,  entity  class  A say,  must  be  related 
to  a specified  number  of  entities  of  the  other  class,  B here,  but  that  entities  in  B can  exist 
independently. 

(c)  A no  dependency  specifies  no  dependency  constraints. 

A direct  relationship  between  the  two  entity  classes  A and  B may  be  represented  in  the 
structural  model  as  one  of  five  choices  (figure  7): 

(1)  A reference  connection:  entity  class  A is  represented  as  a relation  Ro,  referencing  the 
relation  Rb  that  represents  entity  class  B (figure  7a).  The  cardinality  of  the  relationship 
A:B  is  7V:1,  TV  0,  and  the  dependency  is  partial  of  A on  B (each  entity  in  A must  be 
related  to  exactly  one  entity  in  B). 

(2)  An  ownership  connection:  entity  class  A is  represented  by  a relation  Ro  that  owns  a nest 
relation  Rt  representing  entity  class  B (figure  7b).  The  cardinality  of  the  relationship  A:B 
is  1 :N,  N 0,  and  the  dependency  is  partial  of  B on  A (each  entity  in  B must  be  related 
to  exactly  one  entity  in  A). 

(3)  An  association  relation:  relations  Ro  and  R(,  represent  entity  classes  A and  B,  and  an 
association  relation  Rob  represents  the  relationship  (figure  7c).  The  cardinality  of  the 
relationship  A:B  is  A/ :7V,  M ;>  0,  N 0,  and  there  is  no  dependency. 

(4)  A nest  of  references:  relations  Ro  and  Rb  represent  the  entity  classes  A and  B.  A nest  relation 
Rab  owned  by  Ro,  and  a reference  connection  from  Rob  to  Rb  represent  the  relationship 
(figure  7d).  The  cardinality  of  A:B  is  Af:7V,  M 0,  TV  0,  and  there  is  no  dependency. 

(5)  A primary  relation  and  two  reference  connections:  relations  Ro  and  Rb  represent  the  entity 
classes,  and  the  relationship  is  represented  by  a primary  relation  Rob  and  two  reference 
connections  from  R„b  to  Ra  and  Rb  (figure  7e).  The  cardinality  of  A:B  is  M:N,  M ^ 0, 

TV  ;>  0,  and  there  is  no  dependency. 

Other  relationships  may  exist  indirectly.  For  example,  if  entity  classes  A and  B,  and  entity 
classes  B and  C are  directly  related,  an  indirect  relationship  exists  between  entity  classes  A and 
C.  We  will  only  further  consider  direct  relationships  in  this  report. 
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Figure  7.  Representing  two  directly  related  entity  classes 

Data  models  that  represent  the  same  two  related  entity  classes  may  use  different  repre- 
sentations for  the  relationship  according  to  the  way  they  view  the  update  constraints.  Two 
reasons  for  choosing  different  representations  can  be  distinguished:  difference  in  understanding 
and  difference  in  representation.  We  illustrate  the  differences  with  an  example. 

(1)  The  two  data  models  differ  in  their  understanding  of  the  same  real-world  situation.  Consider 
the  two  entity  classes  DEPARTMENTS  and  EMPLOYEES.  It  is  possible  that  one  user 
assumes  that  the  relationship  between  DEPARTMENTS  and  EMPLOYEES  is  1 :N  (each 
employee  works  in  only  one  department).  A second  user  is  aware  of  exceptions  and  con- 
siders the  relationship  A f :N  (an  employee  may  work  in  more  than  one  department).  A 
disagreement  exists  here  about  the  actual  situation  being  modelled,  and  one  of  the  data 
models  is  in  error.  It  may  be  that  the  first  user  knows  only  about  employees  that  work 
in  one  department.  If  such  a conflict  occurs  between  the  two  data  models,  the  real-world 
situation  being  modelled  must  be  re-examined  to  determine  its  actual  characteristics.  We 
will  not  consider  this  problem  further. 

(2)  The  two  data  models  represent  the  real-world  situation  differently,  each  user  choosing  the 
representation  which  best  suits  his  integrity  control  requirements.  Consider  the  DEPART- 
MENTS and  EMPLOYEES  example,  and  suppose  the  relationship  is  of  cardinality  1 :7V. 

It  may  be  represented  in  one  of  the  following  ways,  among  others: 

(a)  a reference  connection  from  EMPLOYEES  to  DEPARTMENTS  (figure  8a), 

(b)  an  ownership  connection  from  DEPARTMENTS  to  EMPLOYEES  (figures  8b, 8c), 

(c)  an  association  relation  restricted  to  1:N  (figure  8d), 

(d)  a neat  of  references  from  EMPLOYEES  to  DEPARTMENTS  (figure  8e). 


The  different  representations  reflect  different  integrity  requirements: 

The  reference  representation  requires  each  employee  represented  in  the  data  model  to  belong 
to  a department,  and  restricts  deletion  of  a department  from  the  data  model  while  it  is 
referenced  by  some  employee. 

The  ownership  connection  representation  also  requires  that  each  employee  belongs  to  a 
department,  but  that  deletion  of  a department  tuple  from  the  data  model  results  in  the 
deletion  of  all  the  employee  tuples  who  work  in  that  department. 

The  association  does  not  place  any  constraints  on  the  existence  of  the  actual  entities  rep- 
resented, the  employee  and  department  tuples.  However,  an  association  can  exist  only 
between  tuples  represented  in  the  data  model. 

Finally,  the  nest  of  references  restricts  the  deletion  of  a department  while  referenced  by 
some  employee,  but  allows  employee  tuples  to  exist  in  the  data  model  that  are  not  related 
to  any  department. 
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(b)Nest  with  unique  employee  identification 


Figure  8.  Different  representations  of  the  DEPiEMP  1 :N  relationship 


Since  the  association  representation  can  be  used  to  represent  M:N  relationships,  but  here  the 
DEP.EMP  relationship  is  l:N,  the  EMP-NO  attribute  must  have  a unique  value  for  each  tuple  in 
the  association  relation.  This  is  indicated  in  figure  8d  by  marking  the  attribute  with  a (U).  Note 
that  this  does  not  violate  Boyce-Codd  normal  form. 

The  nest  of  references  may  also  represent  an  M:N  relationship,  and  to  restrict  it  to  l:Af,  we 
also  mark  the  EMP-NO  attribute  in  the  connecting  nest  relation  by  a (U)  (figure  8e).  We  will  use 
this  convention  throughout  the  examples  in  section  5. 

In  the  ownership  connection  representation,  we  must  consider  two  cases.  The  identifying 
attribute  for  each  EMP  tuple  in  figure  8b  is  EMP-NO,  and  has  unique  values  for  each  employee 
independent  of  his  department.  Hence,  we  mark  it  (U).  In  figure  8c,  the  identifying  attributes  for 
an  EMP  tuple  are  the  two  attributes  DEP-NO  and  EMP-ID,  where  EMP-ID  serves  to  define  the 
employee  within  his  department,  and  hence  is  unique  within  a department  but  is  not  unique  over 
all  employees. 
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The  different  view*  may  ell  be  equally  valid,  and  hence  more  than  one  *et  of  view*,  and 
corresponding  aemantic*,  ha*  to  be  retained  in  the  integrated  database  model  to  that  it  can  serve 
in  a variety  of  situation*. 

We  now  consider  the  problem  of  integrating  different  data  model*,  defined  by  independent  user 
group*  and  application*,  into  an  integrated  database  model,  to  be  used  as  the  conceptual  schema. 
We  assume  a database  system  architecture  similar  to  that  described  by  the  ANSI/X3/SPARC 
report. 


5.  INTEGRATION  OF  DATA  MODELS 


We  now  discus*  integration  of  data  model*.  First  we  briefly  define  our  terminology  for  logical 
database  design. 

A DATA  MODEL  is  a representation  of  the  requirements  of  a particular  potential  database 
user  group  or  application.  The  definition  of  data  models  for  individual  user  groups  that  expect  to 
use  the  database  is  the  first  step  in  the  design  of  an  integrated  database. 

The  DATABASE  MODEL  is  the  integrated  model  created  by  merging  the  individual  data 
models.  During  merging,  differences  in  view  are  bound  to  appear.  The  differences  may  be  resolved 
by  transformations  of  the  original  data  models.  It  is  possible  that  unresolvable  conflicts  will  emerge 
among  the  original  data  models.  Then  managemant  decisions  have  to  be  made  to  force  data  model 
changes,  or  to  abandon  the  integration  with  respect  to  some  data  models. 

A DATABASE  SUBMODEL  is  the  user  or  application  view  that  is  consistent  with  the  integrated 
database  model.  Hence,  if  no  conflicts  occurred  between  a user  data  model  and  the  integrated 
database  model,  the  database  submodel  for  that  user  will  be  the  same  as  the  data  model.  If  some 
conflict  had  arisen,  some  differences  will  exist  between  the  data  model  and  the  database  submodel. 

In  section  5.1  we  consider  some  general  concepts  of  data  model  integration,  and  in  section 
5.2  we  consider  the  integration  of  relations  from  different  data  models  that  represent  the  same 
real-world  entity  class.  In  section  5.3  we  show  how  to  integrate  two  different  representations  of  a 
relationship  between  the  same  two  real-world  entity  classes. 


5.1.  Concepts  of  integration: 

The  data  models  we  integrate  will  represent  real-world  situations  that  partially  overlap, 
otherwise  there  will  be  no  need  for  integration.  Hence  we  expect  to  discover  relations  in  separate 
data  models  that  represent  the  same  entity  classes.  The  first  phase  of  integration  is  to  recognize 
such  relations.  This  is  not  always  a simple  task,  since  different  data  models  may  use  different 
names  for  relations  that  represent  the  same  entity  class. 

Recognition  of  relations  that  represent  the  same  entity  class  in  different  data  models  is  based 
on  matching  ruling  parts,  since  the  ruling  part  defines  the  correspondence  to  an  entity  class.  The 
relation  names  and  the  ruling  part  attribute  names  can  provide  an  initial  hint  to  such  correspon- 
dences. If  data  exists,  similar  values  within  the  ruling  part  attributes  can  further  indicate  candidates 
for  entity  matching.  A match  or  overlap  of  the  domain  definition  of  ruling  part  attributes  can 
establish  the  necessary  equivalence. 

Ruling  parts  may  be  translated  via  lexicons,  so  the  search  for  similar  ruling  parts  must  also 
consider  lexicons  of  ruling  parts  in  the  data  models.  Since  lexicons  preserve  the  identity  of  ruling 
parts,  we  will  not  specify  throughout  that  lexicons  can  be  used  in  the  matching  of  ruling  parts. 
Some  examples  of  equivalence  through  lexicons  will  be  given  in  section  5.2. 

We  assume  in  this  report  that  rigorous  definitions  exist  for  the  domains  that  the  attributes 
cover.  Definition  of  domains  and  attribute  encoding  can  be  a major  effort,  but  is  outside  the  scope 
of  this  report.  This  problem  is  also  addressed  by  people  working  on  the  requirements  analysis 
phase  of  database  design. 

The  second  phase  of  integration,  following  the  recognition  of  relations  that  represent  the 
same  entity  classes,  is  the  recognition  of  differences  in  the  representations.  These  differences  are 
of  three  types: 

(1)  Representation  of  different  properties  of  the  same  entity  class.  This  is  reflected  in  different 
dependent  part  attributes  in  the  relations  that  represent  the  same  entity  class. 


(2)  Representation  of  different  subsets  of  entities  of  the  same  entity  class.  This  it  reffected  in 
different  tuples  in  the  relations  that  represent  the  same  entity  class. 

(3)  A combination  of  (1)  and  (2). 

We  will  cover  integration  of  those  cases  in  section  5.2. 

The  final  phase  is  to  integrate  the  representation  of  relationships  between  two  entity  classes. 
As  shown  in  section  4.2,  there  are  five  ways  to  represent  direct  relationships  in  the  structural 
model.  Data  models  may  choose  to  represent  the  relationship  between  the  same  two  entity  classes 
differently,  according  to  their  view  of  the  situation.  Hence,  the  final  phase  of  integration  is  to 
create  an  integrated  database  model  which  will  support  different  representations  of  relationships 
in  the  data  models.  We  cover  this  phase  in  section  5.3. 

Many  data  models  may  have  to  be  integrated  into  a single  database  model.  To  avoid  excessive 
complexity  we  will  analyze  the  integration  of  only  two  data  models  in  detail.  Successive  integration 
steps  can  merge  another  data  model  with  the  database  model  being  built,  creating  a new  database 
model.  Since  both  data  models  and  database  models  use  the  same  primitives,  this  should  not  pose 
a problem. 

We  hence  have  two  data  models,  data  model  1 (dml)  and  data  model  2 (dm2).  Both  data 
models  will  include  relations  that  represent  some  common  entity  classes,  as  well  as  other  classes 
of  data.  We  only  look  at  one  entity  class  A in  section  5.2,  and  two  entity  classes  A and  B with 
a relationship  between  them  in  section  5.3.  We  will  denote  the  relations  that  represent  entity 
classes  A and  B in  dml  and  dm2  by  R0  and  Rt,.  If  both  representations  are  the  same,  clearly  there 
is  no  need  for  any  transformation,  and  the  integrated  database  model  (idbm)  will  use  the  same 
representation.  If  representations  differ,  we  create  an  idbm  to  support  both  data  models. 

The  idbm  will  then  support  database  submodel  1 (dbsml)  and  database  submodel  2 (dbsm2), 
corresponding  to  dml  and  dm2  respectively.  In  most  cases,  dml  and  dm2  will  not  be  changed,  so 
dbsml  and  dbsm2  will  be  equivalent  to  dml  and  dm2.  In  some  cases,  where  conflicts  appear,  one 
of  the  data  models  may  have  to  be  changed,  and  the  corresponding  database  submodel  will  reflect 
those  changes.  When  the  database  model  is  established,  it  may  also  be  desirable  for  pragmatic 
reasons  to  change  a database  submodel  to  acheive  a better  agreement  with  the  database. 

In  some  cases,  only  a subset  of  the  tuples  in  relation  Ro  (or  R^)  in  the  idbm  correspond  to 
the  Ro  (or  Rs)  relation  included  in  dbsml  or  dbsm2.  We  then  use  a subrelation  to  represent  the 
subset,  and  an  identity  connection  will  join  it  to  R«  in  the  idbm.  For  example,  if  R«  in  dbsml 
corresponds  to  a subrelation  of  Ra  in  the  idbm,  we  denote  this  subrelation  by  R0i  in  the  idbm,  and 
Rol  will  have  an  identity  reference  to  Ro-  This  subrelation  Roi  of  R«  contains  only  the  ruling  part 
attributes  of  Ro,  so  that  no  duplication  of  information  occurs  in  the  idbm.  All  other  attributes  in 
Ro  can  be  accessed  through  the  identity  reference  to  Ro- 

We  do  not  address  the  problem  of  authorization  of  users  to  perform  insertion  and  deletion. 
We  assume  that  every  database  submodel  has  complete  insert,  delete,  and  update  authorization 
over  the  part  of  the  database  model  it  represents.  Hence,  if  one  submodel,  dbsml  say,  inserts 
a tuple  that  does  not  violate  the  integrity  constraints  of  dbsm2,  the  tuple  is  inserted  in  both  of 
them.  If  the  tuple  violates  the  integrity  constraints  of  dbsm2,  it  is  inserted  but  remains  invisible  to 
dbsm2.  For  deletion,  if  deletion  of  a tuple  is  legal  in  dbsml,  say,  but  the  tuple  may  not  be  deleted 
in  dbsm2  because  of  integrity  constraints,  the  tuple  will  be  kept  in  the  idbm  and  in  dbsm2,  but 
will  become  invisible  to  dbsml. 

After  integration,  dbsml  and  dbsm2  are  both  supported  by  the  idbm.  A mapping  will  exist 
from  each  submodel  to  the  idbm.  This  mapping  includes  additional  integrity  rules,  derived  from 
the  integration  process,  which  will  apply  to  the  idbm.  These  rules  are  enforced  when  a database 
•ubmodel  performs  an  insertion,  deletion,  or  update.  We  will  list  these  additional  rules  with  each 
case  of  integration. 


5.2.  Integration  of  different  representation*  of  entity  classes: 


5.2.1.  Recognition  of  relations  that  represent  the  same  entity  class: 

This  phase  of  integration  requires  the  recognition  of  relations  included  in  different  data  models 
that  represent  the  same  entity  class.  Knowledge  of  the  real-world  situations  being  modelled  is 
helpful  to  match  relations  that  represent  the  same  entity  class  but  have  different  names  for  relations 
and  ruling  part  attributes.  The  domain  definitions  of  ruling  part  attributes  will  then  verify  the 
equivalence  of  such  relations  by  their  partial  overlap  or  total  match. 

Some  models  may  include  lexicons  of  ruling  parts  for  some  of  the  relations  in  the  model. 
Examination  of  such  lexicons  is  necessary  when  matching  ruling  parts.  For  example,  dml  may 
include  a relation  EMPLOYEES  that  contains  the  attributes  (EMP-NAME,  ADDRESS,  HOME- 
PHONE,  OFFICE,  OFFICE-PHONE,  DEPT),  representing  a directory  of  the  employees.  Data 
model  2,  representing  job  information,  includes  a relation  EMP  that  contains  the  attributes 
( EMP-NUMBER , AGE,  JOB,  SALARY,  DEPT),  and  a lexicon  relation  {EMP -NUMBER,  EMP- 
NA  ME 1)  (figure  9a).  To  recognize  that  both  relations  represent  the  same  entity  class  of  EMPLOYEES, 
the  integrators  must  consider  both  the  EMP-NUMBER  and  EMP-NA  ME  attributes  from  the  lex- 
icon relation  in  dm2  when  matching  the  ruling  part  of  the  EMP  relation  to  the  ruling  part  of  the 
EMPLOYEES  relation. 

5.2.2.  Integration  of  relations  that  contain  different  attributes: 

We  first  consider  the  case  where  one  representation  dominates  the  other.  Here,  dml  includes  a 
relation  Ri,  and  dm2  includes  a relation  Rj  that  represents  the  same  entity  class  as  Rj,  and  contains 
all  the  attribute*  represented  in  Rj,  plus  some  additional  dependent  part  attributes.  The  idbm 
will  include  a relation  R that  contains  the  set  of  attributes  represented  in  Ri,  and  a subrelation 
R'  of  R that  contains  the  dependent  part  attributes  represented  in  Rj  but  not  in  Rj.  The  tuples 
in  R correspond  to  the  Rj  tuples  in  dbsml,  while  the  subset  of  tuples  in  R'  will  correspond  to  the 
Ra  tuples  in  dbsm2.  When  dbsml  inserts  a tuple,  it  is  only  inserted  in  R,  since  it  does  not  contain 
the  dependent  part  attributes  of  R'.  The  tuple  is  only  visible  to  dbsml.  When  dbsm2  inserts  a 
tuple,  it  is  inserted  in  both  R and  R',  since  it  contain*  the  dependent  part  attributes  of  both  R 
and  R*.  Hence,  the  tuple  is  visible  to  dbsml  also. 

The  general  case  is  that  neither  relation  Ri  of  dml  nor  relation  Ra  of  dm2  contains  the 
complete  (et  of  attributes,  but  each  contains  a set  of  attributes  common  to  both  models,  and  a set 
of  dependent  part  attributes  unique  to  its  model.  In  this  case,  we  must  create  two  subrelations.  An 
example  is  shown  in  figure  9.  Relation  R represents  the  common  attributes,  and  two  subrelations 
Rj  and  Ra  are  used  to  represent  the  tuples  in  dbsml  and  dbsm2  respectively.  When  dbsml  inserts 
an  employee  tuple,  it  is  inserted  in  R and  Ri,  but  is  invisible  to  dbsm2.  When  dbsm2  inserts  the 
tuple  with  the  same  ruling  part  value,  the  tuple  is  also  inserted  in  R2,  and  becomes  visible  to 
dbsm2.  A check  has  to  be  performed  to  ensure  that  common  attributes  have  the  same  values.  Thus, 
the  base  relation  R insures  the  integrity  of  data  values  that  are  common  to  both  data  models. 

The  lexicon  relation  only  references  Ra,  since  it  is  only  represented  in  dbsm2. 

If  the  two  data  models  use  different  ruling  part  attributes,  and  neither  represents  the  ruling 
parjt  attributes  in  the  other  data  model  (for  example,  if  in  figure  11a  dm2  did  not  include  the 
lexicon),  then  two  solutions  exist.  The  first  solution  is  to  change  one  of  the  data  models  to  include 
tWe  ruling  part  attributes  of  the  other  data  model.  The  second  solution,  which  involves  the  database 
administrator,  is  to  create  a lexicon  in  the  idbm  in  which  every  new  tuple  is  included  before  its 
insertion  by  either  data  model. 

We  are  only  dealing  with  the  data  model  here.  When  actual  databases  are  to  be  integrated, 
inconsistencies  may  exist  in  the  data.  For  example,  the  same  employee  may  have  his  department 
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relation  R,  (EMPLOYEES) 


EMP-NAME  ADDRESS  HOME-PHONE  OFFICE  OFFICE-PHONE  DEP-NO 


DM1  (directory  of  employees) 
relation  Rj  (EMP) 


EMP-NO  AGE  JOB  SAL  DEP-NO 


(lexicon) 


EMP-NO  EMP-NAME 


DM2  (job  information) 

(a)Lexicon  of  a ruling  part  that  must  be  considered 


relation  R 


subrelation  R2  (subset  in  DBSM2) 


EMP-NAME  DEP-NO  ««  EMP-NAME  AGE  JOB  SAL 


A 

A 

A subrelation  Ri  (subset  in  DBSM1) 


EMP-NAME  ADDRESS  HOME-PHONE  OFFICE  OFFICE-PHONE 


(lexicon,  visible  to  DBSM2  only) 


EMP-NAME  EMP-NO 


(b)Integrated  database  model 


Figure  9.  Integration  of  different  sets  of  attributes  (with  lexicon) 

listed  as  'foundry'  in  one  data  model,  and  as  'management'  in  another.  This  problem  is  a post 
design  issue,  although  we  note  that  the  structural  model  would  not  allow  this  inconsistency  if  the 
different  submodels  insert  their  tuples  representing  the  same  employee  at  different  times. 

We  also  note  that  although  many  subrelations  may  exist  for  the  same  base  relation  in  the 
integrated  database  model,  this  is  only  at  the  model  level.  At  the  implementation  level,  the  base 
relation  and  all  its  subrelations  may  be  placed  in  the  same  file,  with  a conditional  field  for  each 
■ubrelation  in  each  record  to  indicate  whether  the  record  is  in  the  subrelation  or  not.  It  may  also 
be  worthwhile  to  change  database  submodels  by  making  them  aware  of  a few  additional  attributes 
to  simplify  the  database  model. 


5.2.3.  Integration  of  relations  that  represent  different  sets  of  tuples: 

We  know  consider  the  case  where  there  are  differences  in  the  selection  of  entities  to  be  rep- 
resented. For  example,  if  one  data  model,  dml,  includes  a relation  Rj,  and  dm2  includes  a relation 
Rg  that  represents  a subset  of  the  tuples  in  Rj.  The  idbm  will  then  include  a relation  R and  a 
•ubrelation  II2  of  R to  represent  the  tuples  in  R2  of  dbsm2.  The  subrelation  R2  may  be  a restriction 
subrelation  if  the  subset  of  tuples  in  R2  is  determined  by  attribute  values  in  R,  or  a non-restriction 
•ubrelation  if  the  subset  of  tuples  in  R2  is  determined  externally,  independent  of  the  model. 

For  example,  dml  (for  the  payroll  department)  may  represent  all  employees  of  a company 
in  an  EMPLOYEES  relation,  while  dm2  (for  the  sales  department  of  the  company)  includes 
the  relation  SALES  FORCE,  the  employees  that  work  in  the  sales  department.  The  idbm  then 
includes  a relation  EMPLOYEES,  and  a subrelation  SALES  FORCE  of  EMPLOYEES.  If  the 
EMPLOYEES  relation  contains  a DEPARTMENT  attribute,  the  subrelation  SALES  FORCE  is 
a restriction  subrelation  on  the  DEPARTMENT  attribute,  restricting  the  attribute  to  the  value 
safes.  If  EMPLOYEES  does  not  contain  a DEPARTMENT  attribute,  SALES  FORCE  would  be 
a non-restriction  subrelation.  In  either  case,  after  integration,  dbsm2  is  only  allowed  access  to 
tuples  in  the  SALES  FORCE  subrelation,  but  could  still  access  their  attribute  values  from  the 
base  relation  EMPLOYEES,  while  dbsml  would  be  allowed  access  to  all  employee  tuples. 


The  general  case  it  that  the  tuplet  in  the  two  relationt  partially  overlap  each  other.  Then  dml 
incldet  relation  Rj  and  dm2  includes  relation  R2  that  represent  the  same  entity  class,  such  that 
the  tuples  in  the  two  relations  obey  the  constraints  Ri  f^Rj  ^ 0,  Ri  — Rj  0,  and  Rj  — Rj  5^  0. 

The  idbm  then  includes  a relation  R = Ri  (J  R2,  and  two  subrelations  of  R,  Ri  and  Rj.  Again, 
Ri  or  Rj  could  be  either  restriction  or  non-restriction  subrelations.  For  example,  refering  to  a 
university  database,  dml  (representing  the  computer  science  department  of  the  university)  includes 
a relation  CSD  PROFESSORS,  and  dm2  (representing  information  about  permanent  faculty)  in- 
cludes a relation  TENURED  PROFESSORS.  The  idbm  then  includes  a relation  PROFESSORS, 
and  two  subrelations  of  PROFESSORS,  CSD  PROFESSORS  and  TENURED  PROFESSORS. 
Each  database  submodel  is  allowed  access  to  his  subset,  and  the  base  relation  assures  the  integrity 
of  common  data  represented  in  both  models. 

In  the  last  example,  it  is  possible  that  the  relation  in  each  data  model  contains  attributes 
common  to  both  relations,  and  a set  of  its  own  attributes.  Then,  the  base  relation  in  the  idbm 
will  contain  the  common  attributes,  and  each  subrelation  will  contain  its  own  additional  set  of 
attributes. 

5.3.  Integration  of  different  representations  of  relationships: 

In  the  following  sections  (5.3.1  • 5.3.4),  we  assume  that  we  have  two  data  models,  dml  and 
dm2,  and  that  both  data  models  represent  two  entity  classes  A and  B,  and  a relationship  between 
them.  R0  and  Rb  will  denote  the  relations  that  represent  entity  classes  A and  B.  If  the  representation 
of  the  relationship  between  A and  B involves  an  auxiliary  relation  (association,  primary  or  nest 
relation)  we  will  designate  it  R„b. 

There  are  five  ways  of  representing  a relationship  between  two  entity  classes  in  the  structural 
model  (section  4.2).  Three  of  these  representations  are  not  symmetric  with  respect  to  A and  B 
(reference,  nest,  nest  of  references),  and  two  are  symmetric  (association,  primary).  If  we  consider 
all  possible  combinations  without  looking  at  symmetries,  the  set  of  possible  cases  for  combining 
different  representations  pairwise  is  2 X (5  -f-  4 + 3 4-  2 •+-  1)  = 30.  We  remove  5 cases  where 
the  representation  is  identical  in  both  data  models,  and  (5  + 4)  cases  because  the  association  and 
primary  cases  are  symmetric  with  respect  to  R,  and  Rj,.  Then  18  cases  remain  to  be  consideied. 
We  consider  all  possible  combinations  with  the  association  representation  first  (4  cases)  in  section 

5.1.1.  We  then  consider  the  cases  that  remain  with  nest  of  references  (6  cases,  section  5.1.2),  with 
references  (4  cases,  section  5.1.3),  and  with  nest  (2cases,  section  5.1.4). 

5.3.1.  Integration  with  an  association: 

* • 

In  this  section,  we  consider  integration  of  an  association  with  other  representations  of  a 
relationship.  In  those  cases,  dml  represents  the  relationship  A:B  as  an  association  relation,  and 
dm2  will  use  a different  representation.  The  association  may  represent  a relationship  of  cardinality 
M:N.  Our  assumption  (section  4.2)  that  both  original  data  models  accurately  represent  the  same 
situation  implies  that  the  cardinality  of  both  representations  is  the  same.  Hence,  the  cardinality 
of  the  relationship  is  restricted  to  the  represention  in  dm2. 

In  order  to  demonstrate  how  two  different  data  models  may  be  integrated,  we  will  present 
the  integration  of  an  association  with  the  nest  of  references  (figure  10a). 

In  this  case,  the  only  difference  is  that  dml  can  freely  delete  tuples  from  Rj,,  while  in  dm2 
deletion  is  restricted  by  referencing  tuples  from  R0b.  Hence,  we  create  two  subrelations,  R*i  and 
Robi<  Those  subrelations  represent  the  tuples  in  Rb  (and  Rab)  of  dbsml.  Tuples  in  Rb  and  R^b 
in  the  idbm  may  include  some  tuples  deleted  from  dbsml,  but  not  deleted  from  Rb  and  Rab  in 
the  idbm  due  to  the  deletion  constraint  of  the  reference  in  dbsm2.  These  tuples  are  not  visible  to 
dbsml. 
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Figure  10a.  Integration  of  association  and  nest  of  references 

The  database  submodels  now  obey  the  following  rules.  Insertion  and  deletion  in  Ra  from  either 
dbsml  or  dbsm2  is  unrestricted,  as  is  deletion  of  Ra|,  tuples,  and  unreferenced  R(,  tuples.  If  dbsml 
deletes  a referenced  R&  tuple  (dbsm2  may  not  perform  such  a deletion),  it  is  only  deleted  from 
Rn  (and  the  owned  tuples  are  deleted  from  Rot>1).  These  rules  accurately  reflect  the  constraints 
imposed  by  the  views  represented  in  the  original  data  models. 

For  brevity,  we  will  use  the  following  format  for  each  integration  case.  We  first  list  the 
differences  between  the  two  data  models,  then  list  the  additional  integrity  constraints  that  have  to 
exist  in  the  mapping  from  the  database  submodels  to  the  integrated  database  model.  When  listing 
these  additional  constraints,  ( ‘relation  name”  ) will  mean:  do  the  insertion  or  deletion  specified 
on  "relation”  if  allowed  by  the  integrity  constraints  of  the  idbm. 

We  will  now  present  the  demonstration  case  again  in  brief  notation. 

(a)  ASSOCIATION  AND  NEST  OF  REFERENCES( figure  10a): 

Differences: 

Dml  may  freely  delete  tuples  from  R*,  while  in  dm2,  deletion  of  R*  tuples  is  restricted. 


Additional  constraints: 
dbsml: 

insert:  (1)  R(,  - Rj,,R(,i,  (2)  R0fc-  (Rob.Robi) 
delete:  (1)  Rs  * (Rs),Rsi 

dbsm2: 

insert:  (1)  Rfc  - Rb,R»i,  (2)  R«b  • (Rob.(Robi)) 


The  relation  name  to  the  left  of  the  refers  to  the  database  submodel,  while  those  to  the 
right  refer  to  the  database  model.  We  only  consider  cases  which  need  additional  control  from  the 
constraints.  Insert  in  Ra  of  dbsml  hence  means  insert  in  Ra  of  the  idbm,  since  it  is  not  listed.  In 
dbsml,  insert  in  R*  requires  insertion  of  the  tuple  in  both  R4  and  Rn  of  the  idbm.  Insert  in  Rat, 
requires  insertion  in  (RSb>Rabi)  >n  the  idbm,  the  ( ) brackets  meaning  if  the  integrity  check  of  the 
idbm  will  allow  it,  here  if  both  owner  tuples  exist.  In  dbsm2,  insert  in  Ras  requires  insertion  in 
(Rab,(Rasi)),  which  means:  insert  the  tuple  in  Rat  if  the  integrity  check  of  the  ibdm  holds  (here 
both  the  owner  tuple  in  Ra  and  the  referenced  tuple  in  R*  exist),  then  insert  the  same  tuple  in 
Rati  (if  the  other  owner  tuple  exists  in  R*i). 

Following  each  integration  case,  we  will  give  an  example  with  attributes  to  illustrate  the 
integration  process.  Example  1 illustrates  the  integration  of  association  and  nest  of  references. 
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Figure  10b.  Integration  of  association  and  reference 

(b)  ASSOCIATION  AND  REFERENCE(figure  10b): 

The  cardinality  of  the  relationship  A :B  is  restricted  to  Nil,  since  the  reference  cannot  represent 
an  M :N  relationship. 

Differences: 

(1)  In  dm2,  every  R*  tuple  must  reference  an  R^  tuple,  while  in  dml  not  all  R*  tuples  have 
to  be  associated  with  R*  tuples. 

(2)  In  dm2,  deletion  of  R*  tuples  is  restricted  by  references. 

Additional  constraints: 
dbsml: 

insert:  (1)  R&  - Rb,R6i,  (2)  Rob  * 
delete:  (1)  R*  - (Rb),Rsi,  (2)  Rob  * H«liH»b 


1 


DBSM1  (association) 


DEP-NO  I LOCI  [ EMP-NO  |[  AGE  SAL 


DEP-NO  EMP-NO 


(U) 


EMP-NO 

DEP-NO 

HH 

DEP-NO  ||  LOC 

A 

- 

A 

EMP-NO 

DEP-NO  || 

DEP-NO  || 

(UJ 


DBSM2  (reference) 


IDBM 


dml 

Example  2 

dm2 

idbm 

association 

ra  ra 

+ 

nest 

Ra 

ra  ra 

\ / 
ran 

Rb 

i 1(8 

Rob  (.Rea) 

Figure  10c.  Integration  of  association  and  nest 

dbsm2: 

insert:  (1)  R«  • (Ro,Ro3,(Rob)),  (2)  Rb  ' R*,Rbi 

The  requirement  that  every  Ra  tuple  mutt  reference  an  R&  tuple  in  dm2  lead*  to  the  creation 
of  the  eubrelation  R0j,  while  the  unrestricted  deletion  of  Rj,  tuples  in  dml  leads  to  the  creation 
of  R*i  (example  2). 

(c)  ASSOCIATION  AND  NEST(figure  10c): 

The  cardinality  of  the  relationship  A:B  is  restricted  to  l:N,  since  the  ownership  connection 
can  only  represent  1 :N  relationships. 

Differences: 

(1)  In  dm2,  existence  of  a tuple  in  R|,  requires  the  existence  of  the  owner  tuple  in  Ra,  while 
in  dml,  R*  tuples  can  exist  independently. 

(2)  In  dm2,  deletion  of  a tuple  from  Ra  requires  the  deletion  of  the  owned  tuples  in  R*,  while 
dml  does  not  require  these  deletions. 

Additional  constraints: 
dbsm2: 

insert:  Ra  • (Ra,Rsa) 


33 


DBSM1  (association) 


(U) 


DEP-NO  LOC 

i 

DEP-NO  EMP-NO 

AGE 

SAL 

(U) 

DBSM2  (ne6t  with  unique  identification) 


DEP-NO  LOC  EMP-NO  j AGE 

| SAL  j 

\ 

M) 

DEP-NO 

EMP-NO  || 

IDBM 

(U) 

Example  3 

dml 

dm2 

idbm 

association 

primary 

f*Tl  [*2 

+ 

Rob  m» 

l-Ra  1< (-Rob  1 >1^6  | 

w 

TL 

«< 

«< 

«< 

lK.bl 

Rol  — *\Rabl  I* — l^bl 

Figure  lOd.  Integration  of  association  and  primary 


The  R|,  tuples  of  dbsm2  are  only  those  in  Rj,2  in  the  idbm,  since  they  require  the  existence 
of  the  owner  tuple.  In  the  idbm,  R„b  will  also  represent  the  subset  of  Rb  tuples  in  Rb2. 

Here,  we  must  consider  two  examples,  since  the  ne6t  relation  may  represent  different  tuple 
identification  attributes  than  the  association.  First,  we  consider  the  case  where  the  identification 
is  the  same.  In  example  3,  EMP-NO  identifies  the  employee  in  both  dbsml  and  dbsm2.  Since 
the  cardinality  of  DEPARTMENT:EMPLOYEE  is  1 :N,  the  EMP-NO  attribute  must  have  unique 
values  in  tuples  of  the  relations  marked  (U).  Note  that  this  does  not  violate  Boyce-Codd  normal 
form.  In  this  case,  the  integration  is  straightforward. 

In  example  4,  the  identifying  information  is  different.  Dbsm2  uses  the  two  attributes  ( 
EMP-NO,  CHILD-NAME ) as  ruling  part,  while  dbsml  uses  only  CHILD-ID.  CHILD-ID  uniquely 
identifies  every  child  tuple,  but  CHILD-NAME  does  not.  Here,  if  dbsm2  does  not  represent  the 
attribute  CHILD-ID,  he  has  to  be  made  aware  of  it  to  maintain  the  correct  mapping  between 
CHILD-ID  and  CHILD-NAME  on  insertion  of  child  tuples. 


(d)  ASSOCIATION  AND  PRIMARY(figure  10d): 

The  cardinality  of  the  relationship  A:B  is  M:N. 
Differences: 

In  dm2,  deletion  of  R*  and  Rb  is  restricted  by  references 
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Additional  constraints: 
dbsml: 

insert:  (1)  Ro  - Ro#  Hoi#  (2)  Hi  - Rb#  Rbl#  (3)  Rob  * Rob#  Robi 

delete:  (1)  Ro  - (Ro),  Roi,  (2)  Rb  - (Rb),  Rbl 

dbsm2: 

insert:  (1)  Ro  - Ro#  Rol#  (2)  Rb  - Rb#  Rbl#  (3)  Rob  • Rob#  Robl 


5.3.2.  Integration  with  a nest  of  references: 

Now  we  consider  the  cases  that  remain  with  nest  of  references.  Dml  represents  the  relation- 
ship A:B  as  a nest  of  references,  and  dm2  represent  it  differently.  The  cardinality  of  the  nest  of 
reference  representation  is  M :N,  but  may  again  be  restricted  by  the  representation  in  dm2.  The 
nest  of  reference  representation  is  not  symmetric  with  respect  to  entity  classes  A and  B#  and  so 
we  must  consider  it  twice  with  each  non-symmetric  representation. 

(a)  NEST  OF  REFERENCES  AND  NEST  OF  REFERENCES(figure  11a): 

Differences: 

(1)  Deletion  of  Rb  (Ro)  is  restricted  in  dml  (dm2). 

(2)  Deletion  of  Ro  (Rb)  in  dml  (dm2)  requires  deletion  of  owned  tuples  in  Rob  (R*o)- 


Additional  constraints: 
dbsml: 

insert:  (1)  R0  - Ro#Ral#  (2)  Rb  * Rb#  RbJ#  (3)  Rob  * (Rob#Robl#(Rbo2)) 
delete:  (1)  Ro  - (Ro)#Rai#(Rob)#  (2)  Rb  * (Rb#Rob)# 

dbsm2: 

insert:  (1)  Ro  • Ro#Rol#  (2)  Rb  * Rb#  RbJi  (3)  Rbo  * (Rob>Rbo3#(Rabl))• 
delete:  (1)  Rfl  * (Ro,R.b),  (2)  Rb  - (Rb),  Rbj.(R.b)- 
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When  dbsml  tries  to  delete  an  R„  tuple  in  the  idbm  that  is  referenced  from  Rtoj,  it  is  only 
deleted  from  R„i.  If  the  tuple  is  not  referenced  from  Rbas,  the  tuples  in  Rob  that  correspond  to 
those  deleted  from  R„bi  (due  to  the  deletion  of  Ro)  should  also  be  deleted,  since  they  no  longer 
exist  in  either  R„bi  or  Rabs-  R0b  exists  to  ensure  that  the  tuples  associating  tuples  from  R«  with 
tuple  from  Rb  are  consistent. 

Example  5 illustrates  this  case. 


(b)  NEST  OF  REFERENCES  AND  REFERENCE(figure  lib,  lie): 

Both  nest  of  references  and  reference  are  non-symmetric,  so  we  must  examine  two  cases. 
Case  1 (figure  lib): 

The  cardinality  of  the  relationship  A :B  is  restricted  to  Nil,  since  the  reference  cannot  represent 
an  NiM  relationship. 
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Figure  11c.  Integration  of  nett  of  references  and  reference  (Cate  2) 

Difference*: 

A tuple  in  R«  in  dm2  mutt  be  associated  with  an  R*  tuple. 

Additional  constraints: 
dbtm2: 

insert:  R*  • (R^.R^j) 

Cate  2 (figure  11c): 

Again,  the  cardinality  of  the  relationship  A:B  it  restricted  to  i'.N. 
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Figure  lid.  Integration  of  nest  of  references  and  nest  (Case  1) 


Differences: 

(1)  Deletion  of  Rk  (Ro)  tuples  is  restricted  in  dml  (dm2). 

(2)  Every  Rk  tuple  in  dm2  must  be  related  to  an  Ro  tuple. 

Additional  constraints: 
dbsml: 

insert:  (1)  Ro  - Ra,Roi,  (2)  R0e  - (Rokii  Ro6>^m)  • 

delete:  (1)  R*  . (Ro),R.i,(IU),  (2)  R*  - (R6,R«t),  (3)  Roe  - Ro*i,  (R-fc). 

db*m2: 

insert:  (1)  Ra  - RaiRaii  (2)  Re  * (Rei  RtJ<Roe<Roei)' 
delete:  (1)  R„  - (R,,Rok),  (2)  Rk  - (Rk,  Rflk),Rb2. 

Example  7 illustrates  this  case. 

(c)  NEST  OF  REFERENCES  AND  NEST(figure  lid,  lie): 

Again,  both  nest  of  references  and  nest  are  non-symmetric,  so  we  must  examine  two  cases. 
Case  1 (figure  lid): 

The  cardinality  of  the  relationship  A :B  is  restricted  to  l:Af,  since  the  reference  cannot  represent 
an  N:M  relationship. 
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Differences: 

(1)  Rj,  tuples  may  exist  independently  in  dml. 

(2)  Deletion  of  R*  tuples  is  restricted  in  dml. 

Additional  constraints: 
dbsml: 

insert:  Rq6  * (RokiRw)' 
dbsm2: 

insert:  Ra  - (Rt.Rob.Rtj). 
delete:  Ra  - (Ra),Ra2- 


We  again  consider  two  examples,  because  of  the  different  ways  the  nest  relation  may  repre- 
sent the  tuple  identifying  information.  In  example  8,  we  consider  the  case  where  the  identifying 
information  is  the  same. 

In  example  9,  we  now  consider  the  case  where  the  identifying  information  is  different.  Here, 
we  must  slightly  change  dm2  by  introducing  an  additional  attribute. 

Case  2 (figure  lie): 

The  cardinality  of  the  relationship  A:B  is  restricted  to  Af:l. 

Differences: 

(1)  In  dml,  R«  tuples  can  exist  independently,  while  in  dm2  an  owner  tuple  R*  tuple  must 
exist. 

(2)  In  dml,  deletion  of  R*  tuples  is  restricted  by  references,  while  in  dml,  deletion  of  an  R» 
tuple  requires  deletion  of  related  R«  tuples. 
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Figure  lie.  Integration  of  ne*t  of  reference*  and  nett  (Cate  2) 


Additional  conttrainti: 
dbtml: 

insert:  (1)  R^  - R*/R*2,  (2)  Rai  - (Rot,  (Rjj))' 
dbtm2: 

insert:  (1)  R«  - (R«,Ro2,Roe),  (2)  Rt  - RblRb2~ 
delete  • (Rfc),Rfcj. 

We  will  only  consider  one  example  for  thii  cate,  example  10,  with  different  identification, 
(d)  NEST  OF  REFERENCES  AND  PRIMARY^figure  Ilf): 

The  cardinality  of  the  relationthip  A.B  it  M:N. 

Differences: 

In  dm2,  deletion  of  Ra  is  restricted  by  references 
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Figure  Ilf.  Integration  of  nett* of  reference*  and  primary 


Additional  constraint*: 
dbcml: 

insert:  (1)  IL  - R«i  Roi»  (2)  HoM 

delete:  (1)  R«  • (R0),  Rai ' 

db*m2: 

insert:  (1)  R0  - R0,  Rai.  (2)  R0*  * Rab>  H»6i 


5.3.3.  Integration  with  a reference: 

Dml  represents  the  relationship  A:B  as  a reference  connection  from  R«  to  R*,  and  dm2 
represents  it  using  a different  structure.  The  cardinality  of  the  relationship  A:B  is  N:  1,  possibly 
restricted  by  the  dm2  representation. 

(a)  REFERENCE:  AND  REFERENCE(figure  12a): 

The  cardinality  of  A:B  is  restricted  to  1:1,  since  in  dml  it  is  N:  1,  and  in  dm2  it  is  1 :N.  It 
would  be  unusual  to  encounter  these  two  representations  of  the  same  a 1:1  relationship.  However, 
it  can  be  integrated. 
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Figure  12a.  Integration  of  reference  and  reference 
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Figure  12b.  Integration  reference  and  nest  (Case  1) 


Differences: 

(1)  In  dml  (dm2),  every  R0  (Ra)  tuple  must  reference  an  R*  (R*)  tuple. 

(2)  Deletion  of  Rk  (R„)  tuples  is  restricted  in  dml  (dm2). 

Additional  constraints: 
dbsml: 

insert:  R0  - (Ro,R0i,R06,Rw)- 
delete:  R«  - (R0),Rai>(Rab)- 

dbsm2: 

insert:  Rk  • (Rb,Rb2,Rab,Rbj). 
delete:  Rk  - (Rb),Rb2.(Rob)- 


(b)  REFERENCE  AND  NEST(figure  12b, 12c): 

Case  1 (figure  12b): 

The  cardinality  of  the  relationship  A:B  is  N:  1. 


42 


> 


DBP-NO  II LOC 


£ 


(U) 


EMP-NO  AGE  SAL  DEP-NO 


❖ 

A 


EMP-NO  DEP-NO  * 


DEP-NO 


(U) 


IDBM 


Example  12 


dml 

reference 

+ 

dm2 

neat 

Gk] 

idbm 

fflTl 5® 

E} — *53 

I 

Rb 

H 

«<  ^ 

«<!! 

i 

Figure  12c.  Integration  of  reference  and  neat  (Caae  2) 


Differencea: 

(1)  In  dbaml,  deletion  of  Re  tuplea  ia  reatricted  by  referencing. 

(2)  In  dbem2(  deletion  of  an  Re  tuple  requiree  deletion  of  related  tuplea  in  R«. 

Additional  conatrainta: 
dbaml: 

inaert:  (1)  R.  - (R«,(Roa)),  Re  • Re,  Rea- 
dbam2: 

inaert:  Ra  - (Ro.Roa),  Re  - Re,  Rta- 
delete:  Re  - (Re), Rea- 

Example  12  illuatratea  thia  caae  by  a 1 :N  relationahip  between  DEPARTMENTS.EMPLOYEES. 
Caae  2 (figure  12c): 

The  cardinality  of  the  relationahip  A:B  ia  reatricted  to  1:1,  cince  in  dml  it  ia  Af:l,  and  in 
dm2  it  ia  l:N. 

Differencea: 

(1)  Every  Ra  tuple  in  dml  muet  reference  an  Re  tuple,  while  in  dm2  every  R*  tuple  muet  be 
owned  by  an  Ra  tuple. 

(2)  Deletion  of  Re  tuplea  ia  reatricted  in  dml. 

(3)  Deletion  of  an  Ra  tuple  in  dm2  requiree  deletion  of  owned  Re  tuplea. 
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Figure  12d.  Integration  of  reference  and  primary 

Additional  constraints: 
dbsml: 

insert:  R*  - (Ro.Roi.Rfrj) 
dbsm2: 

insert:  Re  * (Re.Roi.Rfcj) 
delete:  R*  • (R») .Res 

Example  13  illustrates  this  case. 

(c)  REFERENCE  AND  PRIMARY(figure  12d): 

The  cardinality  of  the  relationship  A:B  is  Nil. 
Differences: 

In  dm2,  deletion  of  R«  is  restricted  by  references 

Additional  constraints: 
dbsml: 

insert:  Re  • Re,  (R*i 
delete:  (1)  Rb  ■ (Rkl,  (R*)j 
dbsm2: 

insert:  (1)  R*  • Rfc,  R*i,  (2)  R«t  • R,6,  R.i 
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Example  14 


5.3.4.  Integration  with  * nett: 

(a)  NEST  AND  NEST(figure  13a): 

The  cardinality  of  A:B  it  reetricted  to  1:1. 

DifTerencee: 

(1)  In  dml  (dm2),  every  Rt  (Re)  tuple  mutt  be  owned  by  an  Re  (Re)  tuple. 

(2)  Deletion  of  an  Re  (Re)  tuple  in  dml  (dm2)  requires  deletion  of  the  owned  Re  (Re)  tuple. 

Additional  constraints: 
dbsml: 

insert:  Re  • (Re.Rei) 
dbem2: 

insert:  R,  - (Re, Rea) 

Example  14  illustrates  this  cate. 
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Figure  13b.  Integration  of  nest  and  primary 

(b)  NEST  AND  PRIMARY(figure  13b): 

The  cardinality  of  the  relationship  A:B  is  1:N. 

Differences:  ! 

(1)  In  dm2,  deletion  of  Ro  is  restricted  by  references  y| 

(2)  In  dm2,  deletion  of  an  Ra  tuple  results  in  deletion  of  owned  R^  tuples,  while  in  dm2  deletion 
of  Rfc  is  restricted  by  references 

Additional  constraints: 
dbsml: 

insert:  (1)  R0  - R«,  R0i,  (2)  Rt  - (Rt,  Rti) 

delete:  (1)  R(,  • Rw,  (Rb) 

dbsm2: 

insert:  (1)  Ro  • Ra,  Rai,  (2)  R0s  • Ra6>  Rti 
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6.  RELATIONSHIP  TO  OTHER  MODELS 


In  this  lection,  we  examine  some  of  the  limilaritiec  between  the  structural  model  and  other 
data  models. 


6.1.  The  relational  model: 

In  relational  model  theory,  the  concepts  of  functional  [Codd72]  and  multivalued  [Fagin77] 
dependency  among  attributes  are  important  for  normalization  of  relations  and  data  model  design. 
A functional  dependency  between  two  attributes  A\  and  Aj,  denoted  by  Ai  •=•  Ai,  means  that  for 
each  value  of  DOM(Ai),  a unique  corresponding  value  of  DOM(Aj)  can  be  determined.  Functional 
dependency  between  two  sets  of  attributes  is  defined  correspondingly.  Attributes  of  a relation  R 
in  Boyce-Codd  normal  form  obey  the  constraint:  if  any  attribute  A in  R is  functionally  dependent 
on  a set  of  attributes  X in  R,  and  A is  not  in  the  set  X,  then  all  attributes  in  R are  functionally 
dependent  on  X. 

All  relations  in  the  structural  model  are  in  Boyce-Codd  normal  form,  and  hence  obey  the  above 
constraint.  A functional  dependency  will  also  exist  between  each  attribute  in  a referenced  relation, 
and  the  ruling  part  of  the  referencing  relation.  Hence,  a reference  connection  from  a relation  R 
to  another  relation  R’[Ai,...Aj]  defines  i functional  dependencies  K(R)  ■=>  Ay;  j = 1, ...».  This 
is  so  because  a functional  dependency  K(R)  ■=»  Xr  will  exist  in  relation  R,  where  XT  is  the  set 
of  referencing  attributes  in  R.  We  will  also  have  the  functional  dependencies  X,  =»  K(R’)  and 
K(R’)  Ay;  j = 1, ...s.  From  the  transitivity  rule  for  functional  dependencies,  it  follows  that 
K(R)«  Ay;i-l,...s. 

Since  the  structural  model  is  constructed  from  relations,  a relational  query  system  based  on 
the  relational  algebra  or  the  relational  calculus  can  be  U6ed  on  the  structural  model.  However, 
additional  capabilities  exist  in  the  structural  representation  to  simplify  expression  of  queries  by 
making  U6c  of  represented  connections.  For  example,  consider  the  structural  schema  in  figure 
8a.  A query  such  as  “FIND  THE  WORK  LOCATION  OF  EMPLOYEE  NUMBER  5”  does  not 
have  to  be  expressed  as  a join  between  two  relations,  since  the  reference  connection  specifies  the 
department  tuple  that  corresponds  to  the  tuple  representing  employee  number  5. 

In  the  relational  model,  one  has  to  specify  integrity  constraints  to  maintain  tuples  in  different 
relations  in  a consistent  state.  In  the  structural  models,  such  constraints  may  be  specified  implicitly 
via  connections. 

6.2.  The  hierarchical  model: 

A hierarchical  model  can  be  expressed  using  relations  as  record  types  and  ownership  con- 
nections as  hierarchical  arcs.  Hence,  if  a structural  model  is  restricted  such  that  only  ownership 
connections  are  used,  and  such  that  all  relations  are  connected  together  in  a tree  structure,  a 
hierarchical  definition  tree  would  result.  The  difference  in  representation  is  the  redundancy  created 
by  repetition  of  the  ruling  part  attribute  of  the  owner  relations  in  the  owned  relations.  However, 
such  redundancy  need  not  be  implemented  in  a hierarchical  implementation. 

6.3.  The  network  model: 

The  link  set  concept  of  the  network  model  can  be  represented  in  the  structural  model.  An 
automatic  set  can  be  defined  using  an  ownership  connection.  Again,  the  only  difference  is  the 
redundant  representation  of  the  ruling  part  attributes  from  the  owner  relation  in  the  owned  relation. 
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However,  the  existence  of  the  connecting  attributes  implicitly  specifies  the  set  occurence  when 
a new  member  tuple  is  inserted  in  the  data  model  without  requiring  an  additional  procedure  to 
specify  the  correct  owner. 

A manual  set  can  be  represented  by  a l:N  association  between  two  relations,  as  in  the  ex- 
ample of  figure  8d.  Here,  the  DEPARTMENTS  relation  corresponds  to  the  owner  type,  and  the 
EMPLOYEES  relation  to  the  member  type.  Employee  tuples  can  exist  without  belonging  to  any 
department  tuple,  and  the  set  of  members  of  each  department  owner  tuple  is  specified  via  the 
association. 

The  implementation  oriented  features  of  the  hierarchical  and  network  models  are  implemen- 
tation dependent,  and  hence  are  best  left  to  the  implementation  phase.  We  note  that  the  structural 
model  may  represent  structures  that  arc  not  part  of  any  of  the  three  other  models,  as  shown  in 
section  4.2. 
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7.  THE  DATABASE  DESIGN  PROCESS 


This  section  summarizes  the  process  of  designing  the  database  with  the  aid  of  the  structural 
model,  and  provides  a brief  discussion  of  considerations  for  model  implementation.  The  approach, 
which  we  only  outline  here,  provides  much  of  the  motivation  for  concepts  presented  in  the  structural 
model.  A detailed  description  and  analysis  of  the  remaining  steps  of  the  database  design  process 
will  be  the  subject  of  a later  report. 

An  overview  of  the  entire  design  process  for  an  integrated  database  system  is  given  in  figure 
14.  We  define  three  groups  of  people  that  partake  in  the  design  process:  the  potential  users,  the 
integrators,  and  the  implementors.  These  groups  will  interact  during  the  database  design  process. 
The  vertical  axis  in  figure  14  defines  the  activities  of  each  group  relative  to  a time  frame. 

A potential  user  is  a group  of  people  or  application  programs  that  expect  to  use  the  database 
system.  Many  such  potential  users  will  exist  since  we  are  designing  a large,  integrated  database. 
Each  potential  user  must  analyse  his  requirements,  and  define  a data  model  with  expected  load 
estimates.  Since  a database  typically  serves  many  diverse  but  potentially  related  interests,  many 
such  data  models  can  be  established. 

In  section  4 .1 , we  showed  how  the  structural  model  guides  the  design  of  data  models.  Additional 
information  is  solicited  from  the  potential  user  about  his  expected  use  of  his  data  model.  This 
information  is  not  part  of  the  data  model,  but  is  attached  to  the  relations  and  connections  of 
his  data  model.  This  includes  additional  integrity  constraints,  and  expected  retrieval  and  update 
characteristics  for  the  data  model.  Load  estimates  will  be  classified  into  several  update  and  retrieval 
components  on  the  relations  and  connections  of  the  data  model. 

The  database  integrators  then  undertake  to  combine  these  data  models  into  an  integrated 
database  model.  In  the  process  of  combining  the  data  models,  conflicts  may  arise  which  have  to 
be  resolved  by  changing  some  of  the  data  models.  There  may  be  data  models  which  turn  out  to 
t be  unrelated,  or  weakly  related,  to  the  core  of  the  integrated  database  model  so  that  they  are  not 

included.  The  result  of  the  data  model  integration  is  to  define  preliminary  database  submodels  for 
the  user  groups.  This  process  will  need  consultation  with  the  users  if  their  data  model  has  to  be 
changed. 

The  integrators  then  combine  the  load  estimates  from  the  individual  user  data  models  and 
produce  load  estimates  for  the  database  model.  When  the  transformation  from  data  models  to 
database  model  is  simple,  the  load  estimates  can  simply  be  added  together.  In  complex  situations, 
load  data  will  have  to  be  transformed  to  correspond  to  the  transformation  from  the  data  models 
to  the  database  model. 

The  implemetors  then  use  the  cummulative  load  estimates  on  relations  and  connections  to 
design  the  file  structures  and  access  methods.  They  take  into  account  the  expected  update  and 
retrieval  loads  for  the  database  model.  Connections  that  are  expected  to  be  used  frequently  should 
be  explicitly  represented  in  the  implementation.  A methodology  for  designing  the  file  structures 
and  access  methods  based  on  the  expected  update  and  retrieval  characteristics  of  relations  and 
connections  will  be  described  in  a later  report. 

When  usage  patterns  change,  it  is  reasonable  to  change  implemented  file  structures  and  access 
methods  without  affecting  the  structural  database  model.  Only  the  performance  of  retrieval  and 
updates  along  model  relations  and  connections  whose  implementation  is  changed  will  be  affected. 
Provisions  should  be  made  in  the  implementation  for  such  a restructuring. 

Provisions  must  also  be  made  for  changing  user  data  models,  or  for  deletion  of  existing 
submodels  and  addition  of  new  data  models.  This  may  cause  a change  in  the  database  model. 
Structural  model  changes  which  only  affect  rarely  used  connections  will  be  easier  to  accomodate 
than  changes  which  affect  very  critical  and  tightly  bound  connections. 

49 


i IMF*  » flit1  Of  si  i i i|  i 


USERS 


INTEGRATORS 


IMPLEMENTORS 


Requirement!  analysis 


T 

i 

m 

e 


Data  model!  conitruction 

Load  estimates  on  data 
model! 

Database  submodels  de- 
fined on  database  model 


Data  models  integration 
into  database  model 

Load  integration  onto 
database  model 

Alternative  performance 
estimates 


Alternative  file  selections 
analysis 

File  design  decision 


* 


Performance  prediction 

Figure  14.  The  integrated  database  design  process 


We  will  address  the  issue  of  database  design  and  implementation  using  the  structural  model 
in  a separate  report.  We  will  give  a quantitative  approach  to  database  design,  and  discuss  possible 
implementation  choices  for  the  structural  model  constructs. 


I 


8.  Conclusion*: 


I 

The  model  we  have  presented  provides  a bridge  between  the  simplicity  of  the  relational  model 
and  the  explicitness  of  the  network  model.  On  the  one  hand,  all  structures  in  the  model  are  relations 
in  Boyce-Codd  normal  form  so  that  the  uniformity  of  the  relational  model  is  maintained.  Query 
techniques  devised  for  relational  models  can  be  easily  incorporated  into  the  structural  model.  On 
the  other  hand,  important  structural  information  about  the  real-world  situation  is  incorporated 
L in  the  data  model,  and  provides  important  knowledge  both  for  potential  users  and  for  database 

system  implementors. 

We  then  showed  how  the  different  representations  in  two  data  models  can  be  integrated, 
leading  to  the  construction  of  an  integrated  database  model  which  correctly  supports  the  different 
data  models  of  the  users.  The  integrated  database  model  then  supports  the  user  submodels. 

Our  point  of  view  of  the  implementation  process  is  that  connections  between  relations  have 
to  be  carefully  considered.  Binding  of  important  connections  will  cause  reasonable  levels  of  per- 
formance to  be  achieved.  At  the  same  time,  unbound  connections  remain  recognized,  and  may  be 
employed  when  restructuring  due  to  changing  demands  becomes  necessary.  The  decision  of  which 
connections  to  bind  is  best  supported  by  inclusion  of  connections  which  are  candidates  for  binding 
in  the  database  model. 
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